So what I'm tinkering with now is a multi-node k8s cluster on Raspberry Pi's on which to run k8s on which to run Spark. You may be thinking to yourself, "But those tiny things only have 1Gb RAM - you do get what Spark is right?!" Ya, I get it, but if I've learned anything over the past couple years it is that starting with very small datasets is the absolute best way to develop data-centric applications (generally speaking). What better way to force the issue than to be memory constrained?
So why not just tip up some GKE instances and call it a day? I'll leave that to the reader to debate internally. For me this project is:
- Neat
- Fun
- Looks cool on my desk
- Gives me direct exposure to a bunch of bits of the stack that are otherwise obfuscated
- I can also use the cluster for my home automation projects
- I won't forget to kill a job and get whacked with a bigger than expected Google bill
- ...and it looks cool on my desk
In the end, if I do this right, I will assemble a zero-to-Spark-Job tutorial. Until then, I'll just be adding bits as I go. For example, next up is mounting USB drives to the Pi for extra storage.
Here we go...again. Time to Spark it up some more.