Data Engineering with GCP: Final Thoughts and Key Takeaways
Twenty-two posts later, we are done. This was my retelling of “Data Engineering with Google Cloud Platform” by Adi Wijaya (2nd edition, Packt Publishing, 2024, ISBN 978-1-83508-011-5). Time to look back, share what stuck with me, and give an honest assessment.
What the Book Covered
The book splits naturally into three parts.
Part 1 was the foundation. What is data engineering, what do data engineers actually do, and how does Google Cloud fit into the picture. Chapters 1 and 2 set up the vocabulary and the mental model. ETL vs ELT, data warehouses vs data lakes, batch vs streaming. The basics, but done well.
Part 2 was the hands-on building section, and this is where the book really earns its keep. Chapters 3 through 8 walk you through building actual data solutions. BigQuery for warehousing. Cloud Composer for orchestrating workflows with Airflow. Dataproc and Spark for data lake processing. Pub/Sub and Dataflow for real-time streaming. Looker Studio for visualization. And then Vertex AI and AutoML for machine learning. Each chapter builds on the previous one. By the end, you have touched every major data service in GCP.
Part 3 covered the things that separate a junior from a senior data engineer. Project management and IAM. Data governance, quality, and security. Cost management (because the cloud bill is always someone’s problem). CI/CD pipelines for data infrastructure. And the final chapter about growing your career as a data engineer.
That is a lot of ground covered in one book. Thirteen chapters, and I needed twenty-two posts to retell it. Says something about the density.
What I Liked
It follows a natural learning path. You start with concepts, then build things, then learn to manage and maintain what you built. This mirrors how you actually grow in a real job. Most technical books throw everything at you in a random order. This one has a clear progression.
Practical approach. Adi Wijaya is a cloud data engineer at Google, and it shows. The book does not just describe what services exist. It walks you through building real pipelines with real data. The eCommerce dataset example that runs through most chapters ties everything together in a way that makes sense.
Full stack coverage. Too many GCP books only cover the fun parts: BigQuery, maybe Dataflow, and call it a day. This book includes IAM, governance, cost management, CI/CD, and career development. Those are the chapters most people skip, but they are exactly what you need to be effective in a real team.
Up to date. The second edition came out in April 2024, so it covers recent additions like BigLake, Dataform, Dataplex, and the newer Vertex AI features. Cloud books age fast, and this one was still fresh when I read it.
What Could Be Better
I’m going to be honest because a review without criticism is just marketing.
Too much reliance on console screenshots. A lot of the instructions boil down to “click this button, then click that button.” That teaches you the UI, not the concepts. UIs change. gcloud commands and Terraform configs are more durable. The Terraform section in Chapter 9 was a step in the right direction, but I wish that approach started earlier.
Some topics stay shallow. Streaming with Dataflow and the ML chapters could go deeper. The Dataflow section covers the basics of Apache Beam but does not get into windowing strategies, exactly-once processing, or real production patterns. Same with Vertex AI. You get the overview but not enough to actually deploy a model with confidence.
The exercises feel guided. You follow steps, and things work. That is great for a first pass. But real learning happens when things break and you have to figure out why. Some “troubleshooting” sections or intentionally broken examples would add a lot.
Limited coverage of multi-cloud and hybrid scenarios. In my experience, very few companies are 100% on one cloud. A section on how GCP data services integrate with AWS or Azure tools (or even on-prem systems) would make the book more realistic.
These are not deal-breakers. The book is solid. But second editions are supposed to improve, and a third edition could address these gaps.
Who Should Read This Book
Aspiring data engineers who want a structured path into the field. If you know some SQL and Python but have no idea where to start with cloud data engineering, this is a good first book.
People preparing for GCP certification. The book covers most topics on the Google Cloud Professional Data Engineer exam. It is not a certification study guide specifically, but it builds the understanding you need before drilling practice questions.
Cloud migration teams moving data workloads to GCP. The book gives a solid overview of what is available and how the pieces fit together. Good for technical leads who need to map existing on-prem pipelines to GCP services.
Backend developers who want to understand what the data team does. Even if you never touch BigQuery yourself, knowing how data flows through an organization makes you better at your own job.
Key Takeaways
If I had to distill the entire book into a handful of lessons, these are the ones that stuck:
- Data engineering is the plumbing. The unglamorous work of collecting, cleaning, and transporting data is what makes everything else (analytics, ML, dashboards) possible.
- Choose batch or streaming based on the problem, not the hype. Streaming is cool but expensive and complex. Most business questions can wait for a nightly batch job.
- BigQuery is the center of gravity in GCP data engineering. Almost every pipeline eventually lands data in BigQuery. Learn it well.
- Orchestration is not optional. Cloud Composer (Airflow) turns a collection of scripts into a real pipeline with retries, monitoring, and dependency management.
- Governance and security are not extras. Data quality, access control, lineage tracking. Skip these and your data lake becomes a data swamp. Chapter 10 was one of the most important chapters in the book.
- Cost management is an engineering skill. The cloud makes it easy to spend money. Understanding pricing models, slot reservations, and storage tiers is part of your job.
- The tools change, the patterns do not. ETL, ELT, star schemas, event streaming. These concepts have been around for decades. GCP services are just new implementations of old ideas.
What’s Next
Finishing a book is not the end. It is the starting line. Here is what I would suggest:
Get hands-on. Create a free-tier GCP project and build something. Follow the book exercises if you have not already, but then try your own dataset. Pick a public dataset from BigQuery’s public data and build a pipeline around it.
Consider the certification. The Google Cloud Professional Data Engineer certification is well-regarded in the industry. The book covers most of the exam topics. Pair it with Google’s official practice exams and you will be in good shape.
Go deeper on one service. You cannot be an expert in everything. Pick the service that matters most for your current job (BigQuery, Dataflow, or Composer are good bets) and learn it inside out.
Read the documentation. I know, nobody likes reading docs. But Google’s documentation for BigQuery and Dataflow is actually quite good. The book gives you the map. The docs give you the street-level detail.
Build something that breaks. Seriously. Set up a streaming pipeline, throw malformed data at it, and see what happens. That is where real learning lives.
This series was my way of processing the book and sharing what I learned. If even one post helped you understand a concept better or saved you time, then it was worth writing all twenty-two of them.
Thanks for reading along.
Full Series Index
Getting Started
Building Solutions
- Chapter 3 Part 1: BigQuery Data Warehouse
- Chapter 3 Part 2: Data Modeling in BigQuery
- Chapter 4 Part 1: Cloud Composer Workflows
- Chapter 4 Part 2: Airflow Best Practices
- Chapter 5 Part 1: Building a Data Lake
- Chapter 5 Part 2: Spark on Dataproc
- Chapter 6 Part 1: Streaming with Pub/Sub
- Chapter 6 Part 2: Dataflow Processing
- Chapter 7: Looker Studio Visualization
- Chapter 8 Part 1: ML Basics on GCP
- Chapter 8 Part 2: Vertex AI and AutoML
Architecture and Strategy
- Chapter 9: User and Project Management
- Chapter 10 Part 1: Data Governance Basics
- Chapter 10 Part 2: Data Quality and Security
- Chapter 11: Cost Strategy
- Chapter 12 Part 1: CI/CD Basics
- Chapter 12 Part 2: CI/CD Pipelines
- Chapter 13 Part 1: Growing as Data Engineer
- Chapter 13 Part 2: Certifications and Career
This concludes my retelling of “Data Engineering with Google Cloud Platform” by Adi Wijaya. Start from the beginning or browse all posts with the data-engineering-gcp tag.