Big Data on Kubernetes

Master the art of building scalable, professional big data platforms using Kubernetes and open-source tools.

Building a modern data platform is a daunting task, often plagued by massive operational overhead and complex infrastructure. “Big Data on Kubernetes” by Neylson Crepalde provides a practical, hands-on roadmap to solving these challenges using the world’s most powerful container orchestration platform.

This book guides you through the entire lifecycle of data engineering, from the fundamentals of Docker and Kubernetes architecture to deploying a full “Holy Trinity” stack: Apache Spark for massive processing, Apache Airflow for complex orchestration, and Apache Kafka for real-time event streaming. You’ll learn how to move beyond traditional data warehouses to the flexible Data Lakehouse model, using Trino for high-performance SQL analytics and the ELK stack for real-time visualization.

Whether you’re a data engineer, DevOps professional, or cloud architect, this guide empowers you to build resilient, automated, and cost-effective solutions. It even looks toward the future, showing you how to integrate Generative AI workloads using Amazon Bedrock and RAG patterns. Turn your “data swamp” into a professional data factory with Kubernetes as your foundation.

Rethinking Data Infrastructure: Big Data on Kubernetes

We are living in a world where data is basically everywhere. From your phone to social media and every single online purchase, the amount of info we generate is staggering. But here’s the thing: just having data isn’t enough. You have to be able to process it, and that’s where things get complicated.

Why Containers Are a Must for Data Engineers

If you are working with data today, you can’t really ignore containers. They have become the standardized unit for how we develop, ship, and deploy software. But why do we care so much about them in the big data world?

Building Your Own Data Images

In my last post, we talked about why containers are the bedrock of modern data engineering. But honestly, just running other people’s images only gets you so far. The real magic happens when you start packaging your own custom code.

Decoding Kubernetes Architecture - Part 1

If you want to run big data workloads on Kubernetes, you have to understand how the system is actually put together. It’s not just “magic magic cloud stuff”—it’s a carefully coordinated cluster of machines.

Local Kubernetes With Kind

Reading about architecture is one thing, but actually seeing a cluster run is where it sticks. In the third chapter of Big Data on Kubernetes, Neylson Crepalde moves from theory to practice.

Scaling to the Cloud With Amazon EKS

Testing things locally with Kind is great, but big data usually needs big iron. In this part of the hands-on journey, Neylson Crepalde shows us how to scale up to a managed cloud environment.

The Evolution of Data Architecture

We’ve all heard the terms “Data Warehouse” and “Data Lake,” but do you actually know why we keep switching between them? In Chapter 4 of Big Data on Kubernetes, Neylson Crepalde gives a masterclass on how data architecture has evolved to keep up with the modern world.

The Tools of the Modern Data Stack

We’ve talked about the architecture, but what about the actual tools? To build a modern data lakehouse on Kubernetes, you need a specific set of tools that can handle scale, automation, and speed.

Distributed Processing With Apache Spark - Part 1

If there is one tool that defined the “Big Data” era, it’s Apache Spark. It’s the engine that handles everything from terabyte-scale ETL to complex machine learning. In Chapter 5, Neylson Crepalde breaks down exactly how Spark works and why it’s so powerful on Kubernetes.

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More