Wrapping Up: Big Data on Kubernetes
We have reached the end of our deep dive into Big Data on Kubernetes by Neylson Crepalde. It has been a massive journey, moving from basic Docker containers to complex, real-time AI pipelines.
If I had to boil down everything I learned from this book into three main points, here is what they would be:
1. Kubernetes is the Universal Language
The biggest takeaway is that Kubernetes has solved the “it works on my machine” problem for big data. Whether you’re running a Spark job, a Kafka broker, or a GenAI frontend, it’s all just YAML. This standardization is what allows small teams to manage incredibly complex platforms.
2. The Lakehouse is the Winning Pattern
The move from traditional warehouses to the Data Lakehouse (powered by tools like Trino and Spark) is real. Being able to store data in low-cost S3 buckets and query it with high-performance SQL engines is a game-changer for cost and flexibility. The Medallion Architecture (Bronze -> Silver -> Gold) is a solid, proven way to keep your data clean and useful.
3. Automation is Not Optional
You cannot run a modern data platform manually. From using Helm for deployments to Airflow for orchestration and GitOps for synchronization, automation is what keeps the system reliable. If you’re doing something twice, you should be writing a script or a YAML file for it.
Final Thoughts
Is Kubernetes hard? Yes. Is it overkill for a simple one-off script? Definitely.
But if you are building a platform that needs to scale, handle real-time events, and stay reliable under heavy load, there is simply no better foundation. Neylson Crepalde’s book provides an excellent, hands-on roadmap for anyone brave enough to start the journey.
Thanks for following along with this series! I hope it helped demystify some of these “big” technologies.
Key Takeaways:
- Containers provide the isolation we need.
- Operators bring “smart” management to the cluster.
- The Modern Stack (Spark, Airflow, Kafka) is more accessible than ever.
- GenAI is just another workload on the platform.
Previous: Beyond the Basics: The Kubernetes Ecosystem
Book Details:
- Title: Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
- Author: Neylson Crepalde
- ISBN: 978-1-83546-214-0