In my last post we looked at the most simple way to read some data from Apache Cassandra using Apache Spark from your local machine. Taking the next logical step we will now write some data to Cassandra. The set up for this post is nearly identical to what we did here. Assuming you have done that work this should only take a couple minutes.
Fetch the example from gihubFrom github, clone the sbt-spark-cassandra-rw project.
AssemblyFrom a terminal, cd into the sbt-spark-cassandra-rw project and fire up sbt. Once ready, call assembly to create your application jar file.
[bkarels@ahimsa simple-rw]$ sbtAs before, take note of where your jar is put (highlighted bit above).
[info] Loading project definition from /home/bkarels/dev/simple-rw/project
[info] Set current project to Simple RW Project (in build file:/home/bkarels/dev/simple-rw/)
[info] Packaging /home/bkarels/dev/simple-rw/target/scala-2.10/simpleSpark-RW.jar ...
[info] Done packaging.
[success] Total time: 21 s, completed Dec 3, 2014 10:52:33 AM
Prepare the dataAt the sbt-spark-cassandra-rw project root you will find the file things.cql. Using DevCenter or cqlsh, execute this script against your target Cassandra cluster. If you need to set-up a local cluster for development look here.
In this example we will look a a group of things. Things have keys and values. But, for a thing to matter it must have a value of greater than one. So, we will pull down all things, filter out the things that matter, and write only things that matter into their own table (thingsthatmatter).
It is worth noting that the target table must exist for us to write to it. Unlike some other NoSQL data stores, we must plan ahead a bit more with Cassandra.
Spark it up!If your local Spark cluster is not up and running, do that now. If you need to review how to go about that, you can look here.
Make Sparks fly! (i.e. run it)This bit is identical to the previous example. With your application assembled, Cassandra up and prepared, and your Spark cluster humming; go to a terminal and submit your job.
[bkarels@ahimsa simple-rw]$ $SPARK_HOME/bin/spark-submit --class com.sparkfu.simple.SimpleApp --master spark://127.0.0.1:7077 /home/bkarels/dev/simple-rw/target/scala-2.10/simpleSpark-RW.jarIf all has gone well your terminal will spit out a list of all the things that matter as above. Unaltered, the application will truncate the thingsthatmatter table just before it exits. If you comment out that section in the source, re-assemble, and re-submit the job you could further confirm this by query:
14/12/03 20:42:45 INFO spark.SparkContext: Job finished: toArray at SimpleApp.scala:27, took 0.641849986 s