Friday, November 21, 2014

Get your Cassandra on with ccm (Cassandra Cluster Manager)

UPDATE! (2014-12-9)  After a small bit of experimentation it seems that running the Spark-Fu examples against Cassandra clusters spun up using CCM may be the cause of the performance issues I have been experiencing.  Using a single node Cassandra "cluster" from the tarball has presented the kind of performance I would expect for these simple examples on a laptop.  I followed the great tutorial from Datastax Academy to set this up.  You will need to sign up for the Academy - but it's free and has much great content so I have no issues recommending it.  That said, using CCM to experiment with Cassandra clusters locally seems to be otherwise wonderful and stable.

To do things with Apache Spark on Cassandra we need first have Cassandra.  The best/fastest way (that I know of) to get a Cassandra cluster locally to prototype with is to use CCM.

What is CCM? 

CCM is the Cassandra Cluster Manager.

What does CCM do?

CCM creates multi-node clusters for development and testing on a local machine.  It has no capacity for use in production.

How do I get started with all this CCM voodoo?

I am running this all on CentOS 6.5 - directions will vary for other environments.  We also assume you have java 7 or high installed.

Step 1: Download & install the epel packages:

See -> https://fedoraproject.org/wiki/EPEL

Download the package (e.g. CentOS 6.x)

http://mirror.metrocast.net/fedora/epel/6/i386/epel-release-6-8.noarch.rpm

Install:
[bkarels@ahimsa ~]$ sudo rpm -Uvh epel-release-6-8.noarch.rpm

Step 2: Install python-pip:

[bkarels@ahimsa ~]$ sudo yum -y install python-pip

[bkarels@ahimsa ~]$ pip install cql PyYAML

Step 3: Install Apache Ant (CCM depends on ant)

See ant.apache.org for install instructions.

Step 4: Install ccm: (Cassandra Cluster Manager)


[bkarels@ahimsa ~]$ git clone https://github.com/pcmanus/ccm.git
[bkarels@ahimsa ~]$ cd ccm/
[bkarels@ahimsa ~]$ sudo ./setup.py install

Step 5 (Optional): Get Help

To get help: (this is really a great way to dig in to ccm)
[bkarels@ahimsa ~]$  ccm -help
[bkarels@ahimsa ~]$  ccm [command] -help

Step 6: Do some stuff
CCM has two primary types of operations: 
  1. Cluster commands
  2. Node commands
Cluster commands take the form:
$ ccm [cluster command] [options]

Node commands take the form:
$ ccm [node name] [node command] [options]

So, lets spin up a three node local cluster real quick like: 

[bkarels@ahimsa ~]$ ccm create cluster0 -v 2.0.11
Downloading http://archive.apache.org/dist/cassandra/2.0.11/apache-cassandra-2.0.11-src.tar.gz to /tmp/ccm-bwFLa4.tar.gz (10.836MB)
  11362079  [100.00%]
Extracting /tmp/ccm-bwFLa4.tar.gz as version 2.0.11 ...
Compiling Cassandra 2.0.11 ...
Current cluster is now: cluster0
[bkarels@ahimsa ~]$ ccm list
 *cluster0
[bkarels@ahimsa ~]$ ccm populate --nodes 3
[bkarels@ahimsa ~]$ ccm start
[bkarels@ahimsa ~]$ ccm status
Cluster: 'cluster0'
-------------------
node1: UP
node3: UP
node2: UP
[bkarels@ahimsa ~]$ ccm node2 stop
[bkarels@ahimsa ~]$ ccm status
Cluster: 'cluster0'
-------------------
node1: UP
node3: UP
node2: DOWN
[bkarels@ahimsa ~]$

And just like that you have a three node cluster on your local machine that you can start to play with.  Of course this set of instructions barely scratches the surface of what is possible, but our focus is Spark so this is just to give us something we can read from and write to.

FIN

No comments:

Post a Comment