Monday, December 2, 2019

Spark on k8s on Raspberry Pi's for Fun and Profit - Intro

...and we're back after a bit of time off from this.  Long version of the short story is that running Spark on Kubernetes (k8s) seems to be a very solid direction to take things in general.  Particularly from the standpoint of small, local experimentation.  But as with most "super simple" things like this there is a goodly pile of moving parts, a foundation of sorts, that can trip people up and turn that "Up and Running in 20 minutes" tutorial into a on-and-off two day frustration fest that ends in giving up or going a different direction - and that sucks.

So what I'm tinkering with now is a multi-node k8s cluster on Raspberry Pi's on which to run k8s on which to run Spark.  You may be thinking to yourself, "But those tiny things only have 1Gb RAM - you do get what Spark is right?!"  Ya, I get it, but if I've learned anything over the past couple years it is that starting with very small datasets is the absolute best way to develop data-centric applications (generally speaking).  What better way to force the issue than to be memory constrained?

 So why not just tip up some GKE instances and call it a day?  I'll leave that to the reader to debate internally.  For me this project is:
  • Neat
  • Fun
  • Looks cool on my desk
  • Gives me direct exposure to a bunch of bits of the stack that are otherwise obfuscated
  • I can also use the cluster for my home automation projects
  • I won't forget to kill a job and get whacked with a bigger than expected Google bill
  • ...and it looks cool on my desk
 Don't get me wrong, a lot of this series will be effectively "notes to self", but with hope these notes will be of value for others playing with a pile of ~$200 in hardware.  Also, some of the Pi bits might just be a missing bit of some other project - I'm learning as I go with the Pi so why not share?

In the end, if I do this right, I will assemble a zero-to-Spark-Job tutorial.  Until then, I'll just be adding bits as I go.  For example, next up is mounting USB drives to the Pi for extra storage.

Here we go...again.  Time to Spark it up some more.