Big Data on Kubernetes

Master the art of building scalable, professional big data platforms using Kubernetes and open-source tools.

Building a modern data platform is a daunting task, often plagued by massive operational overhead and complex infrastructure. “Big Data on Kubernetes” by Neylson Crepalde provides a practical, hands-on roadmap to solving these challenges using the world’s most powerful container orchestration platform.

This book guides you through the entire lifecycle of data engineering, from the fundamentals of Docker and Kubernetes architecture to deploying a full “Holy Trinity” stack: Apache Spark for massive processing, Apache Airflow for complex orchestration, and Apache Kafka for real-time event streaming. You’ll learn how to move beyond traditional data warehouses to the flexible Data Lakehouse model, using Trino for high-performance SQL analytics and the ELK stack for real-time visualization.

Whether you’re a data engineer, DevOps professional, or cloud architect, this guide empowers you to build resilient, automated, and cost-effective solutions. It even looks toward the future, showing you how to integrate Generative AI workloads using Amazon Bedrock and RAG patterns. Turn your “data swamp” into a professional data factory with Kubernetes as your foundation.

Mar 05, 2026
Big Data

Rethinking Data Infrastructure: Big Data on Kubernetes

We are living in a world where data is basically everywhere. From your phone to social media and every single online purchase, the amount of info we generate is staggering. But here’s the thing: just having data isn’t enough. You have to be able to process it, and that’s where things get complicated.

Mar 06, 2026
Big Data

Why Containers Are a Must for Data Engineers

If you are working with data today, you can’t really ignore containers. They have become the standardized unit for how we develop, ship, and deploy software. But why do we care so much about them in the big data world?

Mar 07, 2026
Big Data

Building Your Own Data Images

In my last post, we talked about why containers are the bedrock of modern data engineering. But honestly, just running other people’s images only gets you so far. The real magic happens when you start packaging your own custom code.

Mar 08, 2026
Big Data

Decoding Kubernetes Architecture - Part 1

If you want to run big data workloads on Kubernetes, you have to understand how the system is actually put together. It’s not just “magic magic cloud stuff”—it’s a carefully coordinated cluster of machines.

Mar 09, 2026
Big Data

Decoding Kubernetes Architecture - Part 2

In the last post, we talked about the “brain and muscles” of a Kubernetes cluster. But how do we actually tell that brain what to do? We use Objects.

Mar 10, 2026
Big Data

Local Kubernetes With Kind

Reading about architecture is one thing, but actually seeing a cluster run is where it sticks. In the third chapter of Big Data on Kubernetes, Neylson Crepalde moves from theory to practice.

Mar 11, 2026
Big Data

Scaling to the Cloud With Amazon EKS

Testing things locally with Kind is great, but big data usually needs big iron. In this part of the hands-on journey, Neylson Crepalde shows us how to scale up to a managed cloud environment.

Mar 12, 2026
Big Data

The Evolution of Data Architecture

We’ve all heard the terms “Data Warehouse” and “Data Lake,” but do you actually know why we keep switching between them? In Chapter 4 of Big Data on Kubernetes, Neylson Crepalde gives a masterclass on how data architecture has evolved to keep up with the modern world.

Mar 13, 2026
Big Data

The Tools of the Modern Data Stack

We’ve talked about the architecture, but what about the actual tools? To build a modern data lakehouse on Kubernetes, you need a specific set of tools that can handle scale, automation, and speed.

Mar 14, 2026
Big Data

Distributed Processing With Apache Spark - Part 1

If there is one tool that defined the “Big Data” era, it’s Apache Spark. It’s the engine that handles everything from terabyte-scale ETL to complex machine learning. In Chapter 5, Neylson Crepalde breaks down exactly how Spark works and why it’s so powerful on Kubernetes.

Mar 15, 2026
Big Data

Distributed Processing With Apache Spark - Part 2

In the last post, we looked at Spark’s architecture. Now, let’s talk about how you actually write code for it. Neylson Crepalde highlights two main ways to interact with Spark: the DataFrame API and Spark SQL.

Big Data on Kubernetes

Rethinking Data Infrastructure: Big Data on Kubernetes

Why Containers Are a Must for Data Engineers

Building Your Own Data Images

Decoding Kubernetes Architecture - Part 1

Decoding Kubernetes Architecture - Part 2

Local Kubernetes With Kind

Scaling to the Cloud With Amazon EKS

The Evolution of Data Architecture

The Tools of the Modern Data Stack

Distributed Processing With Apache Spark - Part 1

Distributed Processing With Apache Spark - Part 2

About

About BookGrill.net

Category

Tags View all tags

Theme Settings

Accent Color