Data Engineering with Google Cloud Platform

Adi Wijaya's practical guide to building scalable data pipelines and platforms using Google Cloud Platform services like BigQuery, Dataproc, Dataflow, and Cloud Composer.

Data Engineering with Google Cloud Platform (2nd edition, 2024) walks you through the full stack of data engineering on GCP. It starts with fundamentals like ETL, data warehouses, and data lakes, then moves to hands-on building with BigQuery, Cloud Composer (Airflow), Dataproc (Spark), Pub/Sub, Dataflow (Beam), Looker Studio, and Vertex AI.

The book is split into three parts. Part one covers data engineering basics and the GCP ecosystem. Part two is the bulk of the book, where you build actual pipelines, data warehouses, data lakes, streaming systems, visualizations, and ML workflows. Part three tackles the strategic side: project management, data governance, cost control, CI/CD, and career growth including GCP certification prep.

Written by a cloud data engineer at Google with over a decade of experience, the book targets aspiring data engineers, people preparing for the GCP Professional Data Engineer certification, and teams migrating data workloads to Google Cloud. The second edition updates coverage to include Dataform, Dataproc Serverless, BigQuery editions pricing, and Vertex AI pipelines.

Data Engineering With GCP Chapter 1: What Is Data Engineering Anyway?

Chapter 1 starts with a confession most of us in the data world can relate to. Adi Wijaya says he used to think data was clean. Neatly organized, ready to go. Then he actually worked with data in real organizations and realized most of the effort goes into collecting, cleaning, and transforming it. Not the fun machine learning part. The plumbing part.

Data Engineering With GCP Chapter 6 Part 1: Real-Time Data With Pub/Sub

Chapter 6 is where Adi Wijaya switches gears from batch to real-time. After spending Chapters 3 through 5 on batch pipelines with BigQuery, Cloud Composer, and Dataproc, now it is time to talk about streaming data. Two GCP services carry this chapter: Pub/Sub and Dataflow. This post covers the streaming concepts and Pub/Sub. Dataflow gets its own post in Part 2.

Data Engineering With GCP Chapter 7: Making Data Visual With Looker Studio

You spend weeks building pipelines, modeling data, setting up orchestration. Everything works. Data lands in BigQuery clean and on time. And then someone from the business side asks: “So… where do I see the numbers?” That is exactly where Chapter 7 picks up. All that upstream work has to end somewhere useful, and for most organizations that somewhere is a dashboard.

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More