Master the art of building scalable, professional big data platforms using Kubernetes and open-source tools.
Building a modern data platform is a daunting task, often plagued by massive operational overhead and complex infrastructure. “Big Data on Kubernetes” by Neylson Crepalde provides a practical, hands-on roadmap to solving these challenges using the world’s most powerful container orchestration platform.
This book guides you through the entire lifecycle of data engineering, from the fundamentals of Docker and Kubernetes architecture to deploying a full “Holy Trinity” stack: Apache Spark for massive processing, Apache Airflow for complex orchestration, and Apache Kafka for real-time event streaming. You’ll learn how to move beyond traditional data warehouses to the flexible Data Lakehouse model, using Trino for high-performance SQL analytics and the ELK stack for real-time visualization.
Whether you’re a data engineer, DevOps professional, or cloud architect, this guide empowers you to build resilient, automated, and cost-effective solutions. It even looks toward the future, showing you how to integrate Generative AI workloads using Amazon Bedrock and RAG patterns. Turn your “data swamp” into a professional data factory with Kubernetes as your foundation.