Tuesday, November 25, 2014

Using spark-submit to send an application to a local Spark cluster

In my last post (Running a local Apache Spark Cluster)
I went over how to spin up a local Spark cluster for development and prototyping.  Now it is time to build the most basic Spark application to submit to your local cluster.  While this example is heavily based on this example, we will tweak a couple of bits to make it just slightly more interesting.

What you should expect:
  1. Pull down and quickly modify the source.
  2. Package the application into a jar file.
  3. Submit the application using spark-submit to your locally running cluster (or any cluster where the sample file exists on all nodes).
  4. View the expected results in your terminal.

The ready to consume application can be found at:

See the README.md file for direction on how to modify the application to run on your environment.

You will need to have Java, Scala, and SBT installed locally.

(From the README.md file)

Step 1:
Move the file tenzingyatso.txt to a known location on your file system (E.g. /tmp/tenzingyatso.txt)

Step 2:
Modify SuperSimple.scala so the path to tenzingyatso.txt is correct for your system.

val compassionFile = "/home/bkarels/tenzingyatso.txt"

val compassionFile = "/tmp/tenzingyatso.txt"

Step 3:
From the root of this project run package from within SBT:

> package
*** Take note of where the application jar is written ***
[info] Packaging /home/bkarels/dev/super-simple-spark-app/target/scala-2.10/super-simple-spark-app_2.10-0.1.jar ...
[info] Done packaging.

> exit

Step 4:
Since this has been designed to run against a local cluster, navigate to your $SPARK_HOME and use spark-submit to send the application to your cluster:

[bkarels@ahimsa spark_1.1.0]$ ./bin/spark-submit --class com.bradkarels.spark.simple.SuperSimple --master spark:// /home/bkarels/dev/super-simple-spark-app/target/scala-2.10/super-simple-spark-app_2.10-0.1.jar
Talks of peace: 3
Speaks of love: 2


No comments:

Post a Comment