Overview
The most rudimentary mechanisms for writing Tuple2 to Cassandra are pretty well beaten to a pulp. In this post I hope to shine light on few additional, more complex ways to write data to Cassandra as not everything we do with a Spark/Cassandra stack will be reduceByKey on key/value pairs.In the first example in this series we will fetch some data from Cassandra directly into a case class. Then we will transform that RDD into a more compact version of itself and write the resulting collection of case classes back to a different table in our Cassandra cluster.
We will be sticking to our 'human' example from previous examples however, I have tuned the schema a bit. To keep your life simple, I recommend executing the updated CQL which will create a new keyspace and tables for this example.
Prerequisites:
- Java 1.7+ (Oracle JDK required)
- Scala 2.10.4
- SBT 0.13.x
- A Spark cluster (how to do it here.)
- A Cassandra cluster (how to do it here.)
- git
Clone the Example
Begin by cloning the example project from github - sbt-spark-cassandra-writing, & cd into the project directory.[bkarels@ahimsa work]$ git clone https://github.com/bradkarels/sbt-spark-cassandra-writing.git simple-writing
Initialized empty Git repository in /home/bkarels/work/simple-writing/.git/
remote: Counting objects: 21, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 21 (delta 0), reused 17 (delta 0)
Unpacking objects: 100% (21/21), done.
[bkarels@ahimsa work]$ cd simple-writing/
[bkarels@ahimsa simple-writing]$
Prepare the Data
At the root of the project you will see two CQL files: sparkfu.cql and populateHumans.cql. You will need to execute these two files against your local Cassandra instance from Datastax DevCenter or cqlsh (or some other tool) to set things up.Begin by executing sparkfu.cql to create your keyspace and tables:
CREATE KEYSPACE sparkfu WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};Next load up your table with the sample data by executing populateHumans.cql
CREATE TABLE sparkfu.human (
id TIMEUUID,
firstname TEXT,
lastname TEXT,
gender TEXT,
address0 TEXT,
address1 TEXT,
city TEXT,
stateprov TEXT,
zippostal TEXT,
country TEXT,
phone TEXT,
isgoodperson BOOLEAN,
PRIMARY KEY(id)
);
// Letting Cassandra use default names for the indexs for ease.
CREATE INDEX ON sparkfu.human ( isgoodperson ); // We want to be able to find good people quickly
CREATE INDEX ON sparkfu.human ( stateprov ); // Maybe we need good people by state?
CREATE INDEX ON sparkfu.human ( firstname ); // Good people tend to be named "Brad" - let's find them fast too!
// Clearly this is a horrible model you would never ever use in production, but since this is just a simple example.
CREATE TABLE sparkfu.goodhuman (
firstname TEXT,
lastname TEXT,
PRIMARY KEY(firstname,lastname)
);
INSERT INTO sparkfu.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, isgoodperson)Errors? No. Good - we are set to proceed.
VALUES (now(),'Pete', 'Jones', 'm', '555 Astor Lane',null,'Minneapolis','MN','55401','USA','6125551212',True);
...
[Some CQL removed for brevity - file in project source. ]
...
INSERT INTO sparkfu.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, isgoodperson)
VALUES (now(),'Brad', 'Karels', 'm', '123 Nice Guy Blvd.',null,'Minneapolis','MN','55402','USA','6125551212',True);
INSERT INTO sparkfu.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, isgoodperson)
VALUES (now(),'Alysia', 'Yeoh', 't', '1 Bat Girl Way',null,'Metropolis','YX','55666','USA','3215551212',True);
Prepare the Application
In a terminal, from the project root, fire up SBT and assembly the project to create your application jar file.As per normal, take note of where your jar file gets put (highlighted above). Also, note that we have set the resulting jar file name in assembly.sbt - here we have set it to SandyAuthor.jar. (You can only say Cassandra and Writer so many times - having some fun with jar names...)[bkarels@ahimsa simple-writing]$ sbt
[info] Loading project definition from /home/bkarels/work/simple-writing/project
[info] Updating {file:/home/bkarels/work/simple-writing/project/}simple-writing-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Set current project to Sandy Author (in build file:/home/bkarels/work/simple-writing/)
> assembly
[info] Updating {file:/home/bkarels/work/simple-writing/}simple-writing...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 1 Scala source to /home/bkarels/work/simple-writing/target/scala-2.10/classes...
...
[ a whole bunch of assembly output will be here... ]
...
[info] SHA-1: 0202b523259e5688311e4b2bcb16c63ade4b7067
[info] Packaging /home/bkarels/work/simple-writing/target/scala-2.10/SandyAuthor.jar ...
[info] Done packaging.
[success] Total time: 19 s, completed Dec 10, 2014 4:41:29 PM
>
Spark it up!
If your local Spark cluster is not up and running, do that now. If you need to review how to go about that, you can look here.Make Sparks fly! (i.e. run it)
In keeping with our theme of finding good people, we'll do it again. But this time, as you will see in the project source, we will fetch only good people in the first place, transform them into a simplified RDD, and store them off safely separated from all the not good people.From your terminal, using the location of SandyAuthor.jar from above, submit the application to your Spark cluster:
[bkarels@ahimsa simple-writing]$ $SPARK_HOME/bin/spark-submit --class com.sparkfu.simple.Writer --master spark://127.0.0.1:7077 /home/bkarels/dev/simple-writing/target/scala-2.10/SandyAuthor.jarIf all has gone well you should see output like the above. Please note, we are also TRUNCATING the goodperson and personfromtuple tables just before the program exits, so if you look in Cassandra those tables will be empty. Feel free to comment out those lines and validate directly in Cassandra.
...
[ lots of Spark output dumped to terminal here...]
...
14/12/10 16:48:51 INFO spark.SparkContext: Job finished: toArray at SimpleWriting.scala:34, took 0.641733148 s
Alysia Yeoh is a good, simple person.
Edward Snowden is a good, simple person.
Fatima Nagossa is a good, simple person.
Pete Jones is a good, simple person.
Brad Karels is a good, simple person.
Mother Theresa is a good, simple person.
Hiro Ryoshi is a good, simple person.
Neil Harris is a good, simple person.
B Real is a good, simple person.
...
[and then the wide tuple example output]
...
Alysia Yeoh is transgender and can be reached at 3215551212.
Neil Harris is male and can be reached at 9045551212.
Mother Theresa is female and can be reached at null.
Hiro Ryoshi is female and can be reached at 7155551212.
Brad Karels is male and can be reached at 6125551212.
Pete Jones is male and can be reached at 6125551212.
Edward Snowden is male and can be reached at null.
Fatima Nagossa is female and can be reached at 7895551212.
B Real is male and can be reached at 9995551212.
[bkarels@ahimsa simple-writing]$
Then play around with it. Extend the case classes, add columns to the schema, add more transformations, etc.. These simple examples will serve you best if you extend and make them your own.
AWS Training in Bangalore - Live Online & Classroom
ReplyDeletemyTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.
IOT Training in Bangalore - Live Online & Classroom
IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.
Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic. If possible, as you gain expertise, would you mind updating your blog with more information? It is extremely helpful for me. promovingsolutions.co.nz
ReplyDeleteYou make so many great points here that I read your article a couple of times. Your views are in accordance with my own for the most part. This is great content for your readers. budget movers singapore
ReplyDeleteA good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one.I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts read this.
ReplyDeletepython Training institute in Pune
python Training institute in Chennai
python Training institute in Bangalore
Nice tutorial. Thanks for sharing the valuable information. it’s really helpful. Who want to learn this blog most helpful. Keep sharing on updated tutorials…
ReplyDeleteDevops Training in Bangalore
Microsoft azure training in Bangalore
Power bi training in Chennai
Great content thanks for sharing this informative blog which provided me technical information keep posting.
ReplyDeleteData Science Training in Chennai
Data Science training in kalyan nagar
Data science training in Bangalore
Data Science training in marathahalli
Data Science interview questions and answers
Data science training in bangalore
ReplyDeleteAnd indeed, I’m just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective I’ve had.
C C++ Training in Chennai |Best C C++ Training course in Chennai
linux Training in Chennai | Best linux Training in Chennai
Unix Training in Chennai | Best Unix Training in Chennai
uipath training in chennai | Best uipath training in chennai
Rprogramming Training in Chennai | Best Rprogramming Training in Chennai
Thank you much more for sharing the worthful post. Your post is very inspiring for me and kindly updating...
ReplyDeleteExcel Training in Chennai
Excel Course in Chennai
Tableau Training in Chennai
Linux Training in Chennai
Oracle Training in Chennai
Job Openings in Chennai
Spark Training in Chennai
Pega Training in Chennai
corporate training in chennai
Power BI Training in Chennai
Excel Training in Tambaram
ReplyDeleteغسيل خزانات بمكة شركة غسيل خزانات بمكة
غسيل خزانات بجدة شركة غسيل خزانات بجدة
غسيل خزانات بالدمام شركة غسيل خزانات بالدمام
Amazing Post. keep update more information.
ReplyDeleteGerman Classes in Chennai
German Classes in Bangalore
German Classes in Coimbatore
German Classes in Madurai
German Language Course in Hyderabad
German language classes in bangalore
German language course in bangalore
German courses in bangalore
Selenium Training in Bangalore
Software Testing Course in Bangalore
I really enjoyed to read this blog...i got lot of useful information from this blog…
ReplyDeleteAWS Training in Chennai
AWS Training in Bangalore
AWS Training in BTM
AWS Training in Marathahalli
Best AWS Training in Marathahalli
PHP Training in Bangalore
Spoken English Classes in Bangalore
Best AWS Training in Bangalore
Data Science Courses in Bangalore
DevOps Training in Bangalore
Nice Blog, Very Informative Content,waiting for next update...
ReplyDeleteclinical sas training in chennai
clinical sas training
clinical sas Training in Anna Nagar
clinical sas Training in T Nagar
clinical sas Training in OMR
SAS Training in Chennai
Spring Training in Chennai
LoadRunner Training in Chennai
QTP Training in Chennai
javascript training in chennai
Nice blog,I understood the topic very clearly,And want to study more like this.
ReplyDeleteData Scientist Course
Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
ReplyDeleteSimple Linear Regression
Correlation vs Covariance
Really I enjoy your site with effective and useful information. It is included very nice post with a lot of our resources.thanks for share. i enjoy this post. best home projector under 200
ReplyDeleteAmazing Article, Really useful information to all So, I hope you will share more information to be check and share here.
ReplyDeleteJupyter Notebook
Jupyter Notebook Online
Jupyter Notebook Install
Automation Anywhere Tutorial
Rpa automation anywhere tutorial pdf
Automation anywhere Tutorial for beginners
Kivy Python
Kivy Tutorial
Kivy for Python
Kivy Installation on Windows
This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing, data science course in hyderabad with placements
ReplyDeleteI am looking for and I love to post a comment that "The content of your post is awesome" Great work!
ReplyDeleteSimple Linear Regression
Correlation vs covariance
KNN Algorithm
Logistic Regression explained