Wrapping Up: Big Data on Kubernetes
We have reached the end of our deep dive into Big Data on Kubernetes by Neylson Crepalde. It has been a massive journey, moving from basic Docker containers to complex, real-time AI pipelines.
We have reached the end of our deep dive into Big Data on Kubernetes by Neylson Crepalde. It has been a massive journey, moving from basic Docker containers to complex, real-time AI pipelines.
We have built some incredible pipelines over the last few posts. But if you were to take what we’ve built and put it into production today, you’d quickly realize that there is a lot more to managing a platform than just getting the YAML files right.
We have spent the last few weeks looking at individual tools like Spark, Airflow, and Kafka. But in the real world, these tools don’t live in isolation. They need to talk to each other to form a complete data pipeline.
In my last post, we got Spark running natively on Kubernetes. Now, it’s time to bring in the conductor (Airflow) and the nervous system (Kafka). This is where your cluster starts to feel like a real data platform.
We’ve explored Spark, Airflow, and Kafka as individual tools. But the real goal of Neylson Crepalde’s book is to show you how to run them all as a cohesive “stack” on Kubernetes. In Chapter 8, we finally start the heavy lifting of deployment.
Reading about architecture is one thing, but actually seeing a cluster run is where it sticks. In the third chapter of Big Data on Kubernetes, Neylson Crepalde moves from theory to practice.
In the last post, we talked about the “brain and muscles” of a Kubernetes cluster. But how do we actually tell that brain what to do? We use Objects.
If you want to run big data workloads on Kubernetes, you have to understand how the system is actually put together. It’s not just “magic magic cloud stuff”—it’s a carefully coordinated cluster of machines.
We are living in a world where data is basically everywhere. From your phone to social media and every single online purchase, the amount of info we generate is staggering. But here’s the thing: just having data isn’t enough. You have to be able to process it, and that’s where things get complicated.