- Looks cool on my desk
- Gives me direct exposure to a bunch of bits of the stack that are otherwise obfuscated
- I can also use the cluster for my home automation projects
- I won't forget to kill a job and get whacked with a bigger than expected Google bill
- ...and it looks cool on my desk
Monday, December 2, 2019
...and we're back after a bit of time off from this. Long version of the short story is that running Spark on Kubernetes (k8s) seems to be a very solid direction to take things in general. Particularly from the standpoint of small, local experimentation. But as with most "super simple" things like this there is a goodly pile of moving parts, a foundation of sorts, that can trip people up and turn that "Up and Running in 20 minutes" tutorial into a on-and-off two day frustration fest that ends in giving up or going a different direction - and that sucks.
So what I'm tinkering with now is a multi-node k8s cluster on Raspberry Pi's on which to run k8s on which to run Spark. You may be thinking to yourself, "But those tiny things only have 1Gb RAM - you do get what Spark is right?!" Ya, I get it, but if I've learned anything over the past couple years it is that starting with very small datasets is the absolute best way to develop data-centric applications (generally speaking). What better way to force the issue than to be memory constrained?
So why not just tip up some GKE instances and call it a day? I'll leave that to the reader to debate internally. For me this project is:
In the end, if I do this right, I will assemble a zero-to-Spark-Job tutorial. Until then, I'll just be adding bits as I go. For example, next up is mounting USB drives to the Pi for extra storage.
Here we go...again. Time to Spark it up some more.