Monday, December 2, 2019

Spark on k8s on Raspberry Pi's for Fun and Profit - Intro

...and we're back after a bit of time off from this.  Long version of the short story is that running Spark on Kubernetes (k8s) seems to be a very solid direction to take things in general.  Particularly from the standpoint of small, local experimentation.  But as with most "super simple" things like this there is a goodly pile of moving parts, a foundation of sorts, that can trip people up and turn that "Up and Running in 20 minutes" tutorial into a on-and-off two day frustration fest that ends in giving up or going a different direction - and that sucks.

So what I'm tinkering with now is a multi-node k8s cluster on Raspberry Pi's on which to run k8s on which to run Spark.  You may be thinking to yourself, "But those tiny things only have 1Gb RAM - you do get what Spark is right?!"  Ya, I get it, but if I've learned anything over the past couple years it is that starting with very small datasets is the absolute best way to develop data-centric applications (generally speaking).  What better way to force the issue than to be memory constrained?

 So why not just tip up some GKE instances and call it a day?  I'll leave that to the reader to debate internally.  For me this project is:
  • Neat
  • Fun
  • Looks cool on my desk
  • Gives me direct exposure to a bunch of bits of the stack that are otherwise obfuscated
  • I can also use the cluster for my home automation projects
  • I won't forget to kill a job and get whacked with a bigger than expected Google bill
  • ...and it looks cool on my desk
 Don't get me wrong, a lot of this series will be effectively "notes to self", but with hope these notes will be of value for others playing with a pile of ~$200 in hardware.  Also, some of the Pi bits might just be a missing bit of some other project - I'm learning as I go with the Pi so why not share?

In the end, if I do this right, I will assemble a zero-to-Spark-Job tutorial.  Until then, I'll just be adding bits as I go.  For example, next up is mounting USB drives to the Pi for extra storage.

Here we go...again.  Time to Spark it up some more.


  1. It's very useful blog post with inforamtive and insightful content and i had good experience with this information.I have gone through CRS Info Solutions Home which really nice. Learn more details About Us of CRS info solutions. Here you can see the Courses CRS Info Solutions full list. Find Student Registration page and register now.Find this real time DevOps Training and great teaching. Join now on Selenium Training online course. Upskill career with Tableau training by crs info solutions. Latest trending course is Salesforce Lightning training with excellent jobs.

  2. What an extremely wonderful post this is. Genuinely, perhaps the best post I've at any point seen to find in as long as I can remember. Goodness, simply keep it up.pmp certification in malaysia

  3. This is a great post I saw thanks to sharing. I really want to hope that you will continue to share great posts in the future.
    what is hrdf

  4. I have a strategic I'm seconds ago chipping away at, and I have been at the post for such data
    360DigiTMG data analytics course

  5. I looked at some very important and to maintain the length of the strength you are looking for on your website
    hrdf claimable courses"

  6. Buy Herbal Care Products online. We provide Natural Herbal Remedies, health and skin diseases information. Natural herbal products use men and women without side effects.