Thursday, December 4, 2014

Spark 1.1.1 to read & write to Cassandra 2.0.11 - Simple Example


In my last post we looked at the most simple way to read some data from Apache Cassandra using Apache Spark from your local machine.  Taking the next logical step we will now write some data to Cassandra. The set up for this post is nearly identical to what we did here.  Assuming you have done that work this should only take a couple minutes.

Fetch the example from gihub

From github, clone the sbt-spark-cassandra-rw project.


From a terminal, cd into the sbt-spark-cassandra-rw project and fire up sbt.  Once ready, call assembly to create your application jar file.
[bkarels@ahimsa simple-rw]$ sbt            
[info] Loading project definition from /home/bkarels/dev/simple-rw/project
[info] Set current project to Simple RW Project (in build file:/home/bkarels/dev/simple-rw/)                              
> assembly
[info] Packaging /home/bkarels/dev/simple-rw/target/scala-2.10/simpleSpark-RW.jar ...                                       
[info] Done packaging.                                                                                                      
[success] Total time: 21 s, completed Dec 3, 2014 10:52:33 AM
As before, take note of where your jar is put (highlighted bit above).

Prepare the data

At the sbt-spark-cassandra-rw project root you will find the file things.cql.  Using DevCenter or cqlsh, execute this script against your target Cassandra cluster.  If you need to set-up a local cluster for development look here.

In this example we will look a a group of things.  Things have keys and values.  But, for a thing to matter it must have a value of greater than one.  So, we will pull down all things, filter out the things that matter, and write only things that matter into their own table (thingsthatmatter).

It is worth noting that the target table must exist for us to write to it.  Unlike some other NoSQL data stores, we must plan ahead a bit more with Cassandra.

Spark it up!

If your local Spark cluster is not up and running, do that now.  If you need to review how to go about that, you can look here.

Make Sparks fly! (i.e. run it)

This bit is identical to the previous example.  With your application assembled, Cassandra up and prepared, and your Spark cluster humming; go to a terminal and submit your job.
[bkarels@ahimsa simple-rw]$ $SPARK_HOME/bin/spark-submit --class com.sparkfu.simple.SimpleApp --master spark:// /home/bkarels/dev/simple-rw/target/scala-2.10/simpleSpark-RW.jar
14/12/03 20:42:45 INFO spark.SparkContext: Job finished: toArray at SimpleApp.scala:27, took 0.641849986 s
If all has gone well your terminal will spit out a list of all the things that matter as above.  Unaltered, the application will truncate the thingsthatmatter table just before it exits.  If you comment out that section in the source, re-assemble, and re-submit the job you could further confirm this by query:

 Note how all things that matter have a value greater than zero.

What's next?

To make this work we used a simple tuple to read our data into, transform it, and write it back.  While this works for very simple examples we will outgrow this very...we have already outgrown this.  That said, next we'll look to read data directly into a Scala case class, transform that data (perhaps into a new class), and write it back to Cassandra.


  1. تعد الاول افضل شركة غسيل خزانات بالمدينة المنورة تعمل على استخدام افضل ادوات تنظيف وتعقيم خزانات المياه

  2. Betway Casino Welcome Bonus: $20 Free Chip -
    Bonus Terms 슬롯 머신 게임 and Conditions. If you'd like to 메이저사이트추천 see a deposit 강원 랜드 썰 match 포커 디펜스 bonus and bonus terms on 1xbet app Betway Casino, please click here.

  3. The most popular betting site in Korea is Private Toto. Private Toto, which occupies the largest share of the Toto site market, is attracting the attention of Toto sports users by providing various live betting services. Use Toto's private site, which has been identified as a safe playing field, to place bets online. 토토사이트 뱃사공 안전놀이터