Monday, December 8, 2014

Spark 1.1.1 to read from Cassandra into scala case class - Simple Example

As we continue to advance our use cases for Apache Spark on Cassandra it seems only right that our next case puts data into a case class.

Following previous examples, this will be the most simple use case - more elegant and advanced usages are coming...

Update! (2014-12-10): I have added fetching Cassandra data direct into a case class as intended by the connector.  The source for the example has been updated and pushed to the repository.

Prerequisites:

  • Java 1.7+ (Oracle JDK required)
  • Scala 2.10.4
  • SBT 0.13.x
  • A Spark cluster (how to do it here.)
  • A Cassandra cluster (how to do it here.)
  • git

Overview

For this example we will pull data for various humans out of Cassandra and put the relevant bits into our model.  Then we will see about working with only good persons (as determined by the creators of our data - our algorithm is not that advanced...yet).

Clone the Example

Begin by cloning the example project from github - our project is spark-cassandra-to-scala-case-class, & cd into the project directory.
[bkarels@ahimsa work]$ git clone https://github.com/bradkarels/spark-cassandra-to-scala-case-class.git simple-case
Initialized empty Git repository in /home/bkarels/work/simple-case/.git/
remote: Counting objects: 21, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 21 (delta 0), reused 17 (delta 0)
Unpacking objects: 100% (21/21), done.
[bkarels@ahimsa work]$ cd simple-case/
[bkarels@ahimsa simple-case]$ l
total 32K
-rw-rw-r--. 1 bkarels bkarels 1.2K Dec  8 12:24 assembly.sbt
-rw-rw-r--. 1 bkarels bkarels  298 Dec  8 12:24 build.sbt
-rw-rw-r--. 1 bkarels bkarels 1.1K Dec  8 12:24 LICENSE
-rw-rw-r--. 1 bkarels bkarels 1.1K Dec  8 12:24 nicecase.cql
-rw-rw-r--. 1 bkarels bkarels 3.2K Dec  8 12:24 populateHumans.cql
drwxrwxr-x. 2 bkarels bkarels 4.0K Dec  8 12:24 project
-rw-rw-r--. 1 bkarels bkarels  228 Dec  8 12:24 README.md
drwxrwxr-x. 3 bkarels bkarels 4.0K Dec  8 12:24 src

Prepare the Data

At the root of the project you will see two CQL files: nicecase.cql & populateHumans.cql.  You will need to execute these two files against your local Cassandra instance from Datastax DevCenter or cqlsh (or some other tool) to set things up.

Begin by executing nicecase.cql to create your keyspace and tables:
CREATE KEYSPACE nicecase WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

CREATE TABLE nicecase.human (
    id TIMEUUID,
    firstname TEXT,
    lastname TEXT,
    gender TEXT,
    address0 TEXT,
    address1 TEXT,
    city TEXT,
    stateprov TEXT,
    zippostal TEXT,
    country TEXT,
    phone TEXT,
    "isGoodPerson" BOOLEAN, // Case sensitive column name - generally not recommended, but possible.
    PRIMARY KEY(id)
);

CREATE INDEX good_person ON nicecase.human ( "isGoodPerson" ); // We want to be able to find good people quickly

CREATE INDEX person_state ON nicecase.human ( stateprov ); // Maybe we need good people by state?

CREATE INDEX ON nicecase.human ( firstname ); // Let Cassandra use the default name for the index.  Good people tend to be named "Brad" - let's find them fast too!

CREATE TABLE nicecase.goodhuman (
    human_id TIMEUUID,
    PRIMARY KEY(human_id)
);

CREATE TABLE nicecase.badhuman (
    human_id TIMEUUID,
    PRIMARY KEY(human_id)
);

CREATE TABLE nicecase.goodbrad (
    human_id TIMEUUID,
    firstname TEXT,
    PRIMARY KEY(human_id)
);
There are a few extra bits in here that we won't be using in this example (indexes, extra tiny tables), but they won't hurt anything and we will likely use them in our next example or two.

Next, execute populateHumans.cql to load up our human table with some data.
 INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Pete', 'Jones', 'm', '555 Astor Lane',null,'Minneapolis','MN','55401','USA','6125551212',True);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Neil', 'Harris', 'm', '123 Doogie Howser Way',null,'Los Angeles','CA','90211','USA','9045551212',True);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Hiro', 'Ryoshi', 'f', '42 Cemetary Road','Apt. 23','River Falls','WI','55301','USA','7155551212',True);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Biel', 'SaBubb', 'm', '666 Hellpass',null,'Demonville','KY','32054','USA','3125551212',False);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Edward', 'Snowden', 'm', null,null,null,null,null,'RU',null,True);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Mother', 'Theresa', 'f', null,null,null,null,null,null,null,True);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Fatima', 'Nagossa', 'f', '689 First Ave.','Apt 1b','Orlando','FL','32822','USA','7895551212',True);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Casey', 'Steals-a-lot', 'f', '71 Buster Lane',null,'Denver','CO','74811','USA','7895551212',False);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'B', 'Real', 'm', '420 High Way','I forgot','Palo Alto','CA','90255','USA','9995551212',True);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'JJ', 'Jones', 't', '123 Sycamore Way',null,'Las Cruces','CA','91553','USA','9995551212',False);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Diane', 'Feinstein', '?', 'Do not care',null,'Some City','CA','99999','USA','1235551212',False);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Brad', 'Karels', 'm', '123 Nice Guy Blvd.',null,'Minneapolis','MN','55402','USA','6125551212',True);
INSERT INTO nicecase.human (id, firstname, lastname, gender, address0, address1,city, stateprov, zippostal, country, phone, "isGoodPerson")
  VALUES (now(),'Alysia', 'Yeoh', 't', '1 Bat Girl Way',null,'Metropolis','YX','55666','USA','3215551212',True);
There is a lot more data here than we will use for this example, but again, the near future is coming...  Most notable here is how we use the CQL function now() to set the value for our primary key of type TIMEUUID.  We are also, quite intentionally, leaving much of the data values null.  Real data has nulls, might as well get the hang of dealing with it in this context as well.

Assuming you did not encounter any errors, your Cassandra instance is now ready for this example.

Prepare the Application

In a terminal, from the project root, fire up SBT and assembly the project to create your application jar file.
[bkarels@ahimsa simple-case]$ sbt
[info] Loading project definition from /home/bkarels/work/simple-case/project
[info] Updating {file:/home/bkarels/work/simple-case/project/}simple-case-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...                              
[info] Done updating.                                                            
[info] Set current project to Simple Case (in build file:/home/bkarels/work/simple-case/)
> assembly
[info] Updating {file:/home/bkarels/work/simple-case/}simple-case...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...                
[info] Done updating.                                              
[info] Compiling 1 Scala source to /home/bkarels/work/simple-case/target/scala-2.10/classes...                                                                 
[info] Including: minlog-1.2.jar                                                             
...                                               
[info] Checking every *.class/*.jar file's SHA-1.                                            
...
[info] SHA-1: 5b17b5b5ccb29df92a662ad3f404573e7470d576
[info] Packaging /home/bkarels/work/simple-case/target/scala-2.10/CaseStudy.jar ...
[info] Done packaging.
[success] Total time: 28 s, completed Dec 8, 2014 12:37:30 PM
>
As per normal, take note of where your jar file gets put (highlighted above).  Also, note that we have set the resulting jar file name in assembly.sbt - here we have set it to CaseStudy.jar.

Spark it up!

If your local Spark cluster is not up and running, do that now.  If you need to review how to go about that, you can look here.

Make Sparks fly! (i.e. run it)

OK, time to find out who the good people are...
This very simple example will pull a set of CassandraRows out of our human table in Cassandra, iterate over the rows and create a list of Human case classes.  It will then rip through our list of Humans and nicely tell us who is a good person and who is not so much.

An interesting preview of things to come is commented out at or about line 29 of SimpleApp.scala:
row.columnNames
As Cassandra is NoSQL a different set of columns could exist for each row.  For our software to be reactive, we need mechanisms to detect these differences so that they can be handled.  So play with that - it likely will be part of your future.

With that noted, let's run things.  Using the location of our CaseStudy.jar from above, open a terminal and use spark-submit to submit the application to you Spark cluster:
[bkarels@ahimsa simple-case]$ $SPARK_HOME/bin/spark-submit --class com.bradkarels.simple.CaseStudy --master spark://127.0.0.1:7077 /home/bkarels/work/simple-case/target/scala-2.10/CaseStudy.jar
...
Fatima Nagossa is a good person.
Neil Harris is a good person.
Pete Jones is a good person.
Joe Jones is not a good person.
Edward Snowden is a good person.
Brad Karels is a good person.
Biel SaBubb is not a good person.
Mother Theresa is a good person.
Alysia Yeoh is a good person.
Hiro Ryoshi is a good person.
B Real is a good person.
Casey Steals-a-lot is not a good person.
Diane Feinstein is not a good person.
If all has gone to plan, you should see output like the following above.  You will actually see two sets of the above out put as the example does the same operation two ways - fetch direct to case class and a more verbose row parsing mechanism.

What's next?

Next we hope to look into writing data to Cassandra from scala case classes.  Or some other interesting thing...

FIN

7 comments:

  1. Nice article, I am visiting this site and reading very informative content at this time.
    kajal agarwal hot

    ReplyDelete

  2. تنظيف منازل بالدمام تنظيف منازل بالدمام
    تنظيف منازل بالاحساء تنظيف منازل بالاحساء
    تنظيف منازل بمكة تنظيف منازل بمكة
    تنظيف منازل بجدة تنظيف منازل بجدة
    تنظيف منازل بالمدينة المنورة تنظيف منازل بالمدينة المنورة

    ReplyDelete
  3. Crypto-currency as a modern form of the digital asset has received a worldwide acclaim for easy and faster financial transactions and its awareness among people have allowed them to take more interest in the field thus opening up new and advanced ways of making payments. Crypto.com Referral Code with the growing demand of this global phenomenon more,new traders and business owners are now willing to invest in this currency platform despite its fluctuating prices however it is quite difficult to choose the best one when the market is full. In the list of crypto-currencies bit-coins is one of the oldest and more popular Crypto.com Referral Code for the last few years. It is basically used for trading goods and services and has become the part of the so-called computerized block-chain system allowing anyone to use it thus increasing the craze among the public, Crypto.com Referral Code.

    Common people who are willing to purchase BTC can use an online wallet system for buying them safely in exchange of cash or credit cards and in a comfortable way from the thousands of BTC foundations around the world and keep them as assets for the future. Due to its popularity, many corporate investors are now accepting them as cross-border payments and the rise is unstoppable. With the advent of the internet and mobile devices,information gathering has become quite easy as a result the BTC financial transactions are accessible and its price is set in accordance with people’s choice and preferences thus leading to a profitable investment with Crypto.com Referral Code Code. Recent surveys have also proved that instability is good for BTC exchange as if there is instability and political unrest in the country due to which banks suffer then investing in BTC can surely be a better option. Again bit-coin transaction fees are pretty cheaper and a more convenient technology for making contracts thus attracting the crowd. The BTC can also be converted into different fiat currencies and is used for trading of securities, for land titles, document stamping, public rewards and vice versa.

    Another advanced block-chain project is Ethereumor the ETH which has served much more than just a digital form of crypto-currency Crypto.com Referral Code and its popularity in the last few decades have allowed billions of people to hold wallets for them. With the ease of the online world,the ETH have allowed the retailers and business organizations to accept them for trading purposes, therefore, can serve as the future of the financial system.

    ReplyDelete
  4. Our full Lace Front Wigs are all hand made with a lace cap. They are manufactured with thin lace sewn on top of the cap. Individual hairs are then sewn onto the thin lace. Each lace wig has lace all around the unit which will need to be cut prior to securing the wig to your head. You will need to cut along the hairline around your entire head. By doing so, you will be able to wear your hair anyway you like. You can even style ponytails, up-dos, etc. Once the Lace Wigs is successfully applied, it will appear that all the hair is growing directly from your head!

    Lace front wigs are hand-made with lace front cap & machine weft at back. Lace front wigs are manufactured with a thin lace that extends from ear to ear across the hairline. When you receive the wig, the lace will be quite long in the front. Cut and style according to your preference, as you will need to apply adhesive along the front of the wig. Once the wig is applied, you will still have Lace Wigs with a very natural appearance.
    TeamWigz Provide the Best Lace Front Wigs and Lace Wigs in Johannesburg and South Africa.

    ReplyDelete
  5. 우리카지노 에 오신 것을 환영합니다. 국내 최고의 카지노사이트 에 가입하여 바카라사이트 에서 다양한 게임을 즐기시면서 대박의 기회를 놓치지마세요! 우리 카지노는 한국의 바카라 산업을 지배하는 카지노 사이트입니다. 우리 카지노는 한국 바카라 시장 점유율의 50 % 이상을 차지하는 10 년 이상 온라인 바카라 시장을 지배 해 왔기 때문에 우리 카지노를 모르는 사람은 거의 없습니다.

    ARTICLE: 우리카지노는 대한민국의 바카라 업계를 장악하고 있는 카지노사이트 입니다. 우리카지노가 대한 민국에서 장악한 바카라 시장점유율이 50%가 넘고 10년 넘게 온라인 바카라 시장을 장악해왔기 때문에 대한민국에서는 우리카지노를 모르는 사람은 드뭅니다. 이런 바카라 업계의 독보적인 입지 때문에 늘 유명하거나 최고만을 찾는 사람들이 카지노사이트를 찾을때는 늘 우리카지노를 찾습니다.바카라를 처음 시작하시는 초보자분들에게도 우리카지노에서 카지노사이트를 시작하시기 좋은 환경입니다. 우리카지노사이트에서는 신규가입시 3만쿠폰을 지급 해주기 때문입니다. 사람들이 늘 1등만을 찾는 이유는 분명 있습니다. 다른 카지노사이트와는 달리 우리카지노를 이용하실시 에이전트를 끼고 게임을 하신다면 본사 이외에 활동쿠폰 및 오링쿠폰을 별도로 제공해주고 있기 때문입니다. 이러한 이유들 때문에 카지노사이트 업계에서 바카라를 즐기신다면 다들 우리카지노를 선호 하십니다. 카지노사이트에서 바카라를 이기기 물론 어렵습니다. 하지만 우리카지노의 에이전트를 끼고 바카라를 즐기신다면 승산이 있다고 봅니다. 우리카지노 에이전트의 연락처는 홈페이지로 연락하시면 언제든지 부담없이 소통가능 합니다. 카지노사이트를 선정할때는 바카라를 다른곳보다 유리하게 즐길 수 있는 카지노를 선택해야한다고 생각합니다. 그것이 바로 우리카지노 입니다. 이상으로 우리카지노와 바카라 카지노사이트 사이의 상관관계를 알아보았습니다바카라사이트.

    ReplyDelete
  6. Tongkat Ali ist eine Pflanze beheimatet in Südostasien. Sie gehört zur Gattung der Bittereschengewächse und Ihr botanischer Name lautet “Eurycoma longifolia”. Es gibt noch eine weitere Reihe länderspezifischer Namen
    wie z. B. “Pasak Bumi”, wie die Pflanze in Indonesien genannt wird oder “longjack”, die allgemeine Bezeichnung für Tongkat Ali Kaufen in den USA, Kanada und Australien.

    Das Ursprungsland von Tongkat Ali Kaufen ist Indonesien, daher findet man auch dort auch die größten Bestände. Weitere Vorkommen gibt es in Ländern wie Thailand, Malaysia, Vietnam und Laos.

    Die Einnahme von Tongkat Ali Kaufen empfiehlt sich insbesondere für Leistungssportler, die einen schnellen
    Muskelaufbau und Muskelzuwachs anstreben und nicht auf illegale und künstliche Substanzen zurückgreifen möchten um Ihren Testosteronspiegel zu erhöhen.

    Generell empfiehlt sich eine Einnahme von Tongkat Ali für alle Männer ab dem 30ten Lebensjahr, da in dieser Phase nachweislich die Produktion von körpereigenem Testosteron zurückgeht. Dies macht sich vor allem dadurch bemerkbar dass die körperliche Leistungsfähigkeit nachlässt, die Lust auf Sex spürbar abnimmt und dadurch allgemein das Selbstwertgefühl negativ beeinflusst wird.

    Mit der Einnahme von Tongkat Ali können Sie nachweislich Ihre Libido Steigern, Ihr Testosteron erhöhen und Ihre gewohnte Lebensenergie aus den jungen Jahren Ihres Lebens wieder herbeiführen. Hier können Sie übrigens weitere Informationen bekommen zum Thema ‘Libido steigern‘ ganz natürlich. Sollten Sie daran interessiert sein lohnt es sich auch unseren Artikel über Butea Superba zu lesen.

    Tongkat Ali wächst als Strauch und kann Höhen von bis zu 12 Metern erreichen. Typischerweise dauert es 10-15 Jahre bis solche Ausmaße erreicht werden. Der Strauch trägt anfangs grüne Blüten die sich im Laufe der Zeit, bis zur Reife, rot färben.

    Allerdings sind die Blüten im Hinblick auf die Wirkung der Pflanze weniger interessant. Der wertvolle Teil verbirgt sich unter der Erde.
    Im Laufe der Jahre wachsen die Wurzeln teilweise senkrecht und bis zu mehrere Meter tief in den Boden, was die Ernte zu einer schweren und mühsamen Arbeit werden lässt. Je älter die Wurzeln sind, desto höher ist auch die Anzahl der Wirkstoffe Butea Superba.
    Von daher gilt es einige Dinge zu beachten sollten Sie Tongkat Ali kaufen wollen.

    ReplyDelete