Overview
Welcome to the part one of the series 'Spark + Kafka + Cassandra'.In this series we will look to build up a Spark, Kafka, Cassandra stack that can be used as the foundation for real projects on real clusters that do real work.
As is typical here at Spark-Fu! we will start with the most basic case and build on that incrementally. Keeping with that, this first instalment will have you:
- Run ZooKeeper local (as part of Kafka distribution)
- Run Kafka local
- Run a local Spark Cluster
- Create a Kafka Producer wrapped up in a Spark application
- Submit the application and consume the messages in a terminal
Prerequisites:
- Java 1.7+ (Oracle JDK required)
- Scala 2.10.4
- SBT 0.13.x
- A Spark cluster (how to do it here.)
- ZooKeeper and Kafka running local
- git
Get ZooKeeper & Kafka
So, before we even clone the example application, we have to do a bit of work to get ZooKeeper and Kafka up and running on your machine.At the time of this post, 0.8.2-beta is the latest release. The current stable version is 0.8.1.1. So let's use 0.8.1.1 - start by downloading it here. (NOTE: This is the Scala 2.10.x version!) Other variants are all available here if you have other needs.
Once downloaded, extract the archive - for this example I'm a fan of putting things at:
~/kafka_2.10-0.8.1.1
The rest of the example will assume this is the case.
Run ZooKeeper Local
To be polite, setting up local ZooKeeper is about as fun as giving a bad tasting pill to a pissed of cat with claws. Thankfully, the Kafka folks have done us a solid and bundled a mechanism to get up and running with ZooKeeper as simple as:[bkarels@rev27 kafka_2.10-0.8.1.1]$ bin/zookeeper-server-start.sh config/zookeeper.propertiesYes, there is an infinite number of ways to get ZK up and running - but the above is as far as I'm going here. Simple and functional.
Run Kafka Local
The next set of steps is near exact copy of the Kafka Quickstart. I have tuned the bits here that jive with the example application so do look at the Kafka docs, but do follow the steps below.Assuming you have Kafka downloaded and extracted as above, lets start things up, create our topic, and get a list of topics.
Start the server:
[bkarels@rev27 kafka_2.10-0.8.1.1]$ bin/kafka-server-start.sh config/server.properties &Create topic `sparkfu`:
[bkarels@rev27 kafka_2.10-0.8.1.1]$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic sparkfu &List our topic(s) as an exercise:
[bkarels@rev27 kafka_2.10-0.8.1.1]$ bin/kafka-topics.sh --list --zookeeper localhost:2181All of the above commands spit out a goodly amount of "stuff" to your console, but just a bit of careful examination should have you feeling confident that everything is up and running correctly. Most importantly, when you listed out your topics that somewhere in that output you saw sparkfu listed (will be on it's own line).
To be certain, we can test your sparkfu topic. Open two terminals and navigate to ~/kafka_2.10-0.8.1.1. In one terminal start up a consumer to sit and listen for messages on your topic:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic sparkfu --from-beginningIn your other terminal we'll fire up a producer:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic sparkfuOnce your producer is started and staring at you, you can enter any string and press [Enter] to produce the message. Within a moment you should see your message parroted back to you in the terminal where you started up the consumer. If you crave greater details review the Kafka documentation.
Kill your producer, but leave your consumer running - you'll need it in just a bit...
Clone the Example
Begin by cloning the example project from github - spark-kafka-producer, & cd into the project directory.[bkarels@rev27 work]$ git clone https://github.com/bradkarels/spark-kafka-producer.git
Cloning into 'spark-kafka-producer'...
remote: Counting objects: 19, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 19 (delta 0), reused 15 (delta 0)
Unpacking objects: 100% (19/19), done.
[bkarels@rev27 work]$ cd spark-kafka-producer/
[bkarels@rev27 spark-kafka-producer]$
Prepare the Application
In a terminal, from the project root, fire up SBT and assembly the project to create your application jar file.As per normal, take note of where your jar file gets put (highlighted above). Also, note that we have set the resulting jar file name in assembly.sbt - here we have set it to sparkFuProducer.jar.[bkarels@rev27 spark-kafka-producer]$ sbt
Picked up _JAVA_OPTIONS: -Xms1G -Xmx2G -XX:PermSize=512m -XX:MaxPermSize=1G
[info] Loading project definition from /home/bkarels/work/spark-kafka-producer/project
[info] Updating {file:/home/bkarels/work/spark-kafka-producer/project/}spark-kafka-producer-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Set current project to Spark Fu Kafka Producer (in build file:/home/bkarels/work/spark-kafka-producer/)
> assembly
...
[info] SHA-1: 69db758e5dd205ae60875f00388cd5c935955773
[info] Packaging /home/bkarels/work/spark-kafka-producer/target/scala-2.10/sparkFuProducer.jar ...
[info] Done packaging.
[success] Total time: 11 s, completed Jan 20, 2015 12:08:44 PM
>
Spark it up!
If your local Spark cluster is not up and running, do that now. It's not!? Mon Dieu! Go here to see about remedying that.Make Sparks fly! (i.e. run it)
Assuming you have poked around in the code, you will have seen that the Messenger within MillaJovovich.scala (a few will see my humor...) will produce four messages for the topic sparkfu and send them. So, when this application is submitted to the cluster it will have no clear output like our previous examples. This is where the consumer you have running will come in. If you stopped it no worries, just restart it as you did above.bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic sparkfu --from-beginningTruth be told you could start it up after the fact since we are telling it to read all messages from the start of time. But isn't it more fun to see messages come in as your job is sending them? Yes, it is.
OK, spark it up:
[bkarels@rev27 kafka_2.10-0.8.1.1]$ $SPARK_HOME/bin/spark-submit --class com.bradkarels.simple.Messenger --master spark://127.0.0.1:7077 /home/bkarels/dev/spark-kafka-producer/target/scala-2.10/sparkFuProducer.jarIf all has gone well you should see output like the following in your consumer as the application runs:
[2015-01-20 12:28:09,547] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor)
What the foo?
What the bar?
What the baz?
No! No! NO! What the FU!
[2015-01-20 12:28:10,071] INFO Closing socket connection to /192.168.254.153. (kafka.network.Processor)
That's it. Well, that's it as far as scraping ever so lightly at the surface of this topic. In the following parts of this series we will dig a bit deeper.
What's next?
As I alluded to in the overview the next parts in this series will provide examples for:- Rudimentary Kafka consumer in Spark
- Custom Kafka Message encoder
- Consuming Kafka messages using Spark Streaming
- A larger example pulling all these bits together
Hi, I'm getting the following error, when i entered assembly command
ReplyDelete[trace] Stack trace suppressed: run last *:assembly for the full output.
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/462522/.ivy2/cache/org.apache.avro/avro-ipc/jars/avro-ipc-1.7.7-tests.jar:META-INF/maven/org.apache.avro/avro-ipc/pom.properties
[error] /Users/462522/.ivy2/cache/org.apache.avro/avro-ipc/jars/avro-ipc-1.7.7.jar:META-INF/maven/org.apache.avro/avro-ipc/pom.properties
[error] Total time: 1321 s, completed 19 Aug, 2016 1:18:47 PM
my set file is
name := "Spark Fu Kafka Producer"
version := "0.0.1"
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2",
"org.apache.kafka" %% "kafka" % "0.10.0.1",
"joda-time" % "joda-time" % "2.7"
)
I have long not read such an interesting and fascinating article, you have a wonderful idea for the blog!
ReplyDeleteThank you for your creativity and for sharing all these emotions!
Richard Brown https://www.idealsvdr.com
AWS Training in Bangalore - Live Online & Classroom
ReplyDeletemyTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.
IOT Training in Bangalore - Live Online & Classroom
IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.
If you want to take a great deal from this post then you
ReplyDeletehave to apply these strategies to your won blog.
KissAnime alternatives
Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article.
ReplyDeleteThanks for sharing such a nice info.I hope you will share more information like this. please keep on sharing!
internship in chennai
internship in chennai for cse
internship for mba in chennai
internship in chennai for hr
internship in chennai for mba
companies for internship in chennai
internship in chennai for ece
paid internship in chennai
internship in chennai for biotechnology
internship in chennai for b.com students
Amazing Article,Really useful information to all So, I hope you will share more information to be check and share here.
ReplyDeleteinternship in chennai for electrical engineering students
one month internship in chennai
vlsi internship in chennai
unpaid internship in chennai
internship for hr in chennai
internship training chennai
internship for freshers in chennai
internship in chennai for it students with stipend
internship in accenture chennai
naukri internship in chennai
There are lots of Natural Remedies for Achalasia in market but these are very expensive. A product made by Natural Herbs Clinic is one of the useful and low prices Natural Remedy for Achalasia which works without any risk. It is a low price product and made with herbal ingredients that work without any side effects.
ReplyDeleteThere are lots of Natural Remedies for Achalasia in market but these are very expensive. A product made by Natural Herbs Clinic is one of the useful and low prices Natural Remedy for Achalasia which works without any risk. It is a low price product and made with herbal ingredients that work without any side effects.
ReplyDeleteIf you invest on tech companies that know what the industry is all about, then you hit jackpot.
ReplyDeletemygcu.edustudentportal
Watch cartoon online
movies from Solar Movies