Data Engineering with GCP Chapter 13 Part 1: Growing Your Confidence as a Data Engineer

Chapter 13 is the last chapter in the book, and it’s different from everything that came before. No new GCP services, no hands-on exercises, no Terraform scripts. Instead, Adi Wijaya steps back and talks about the bigger picture: certifications, where data engineering is heading, and how to actually feel confident in this role.

This part covers the certification overview, the quiz format, and the past/present/future of data engineering. Part 2 will focus on the confidence-building framework.

The Certification Question

The author is pretty direct about this: take the Google Cloud certification. Not just for the certificate itself, but for what the preparation process does to your knowledge. Whether you pass or fail, you learn things along the way. He’s passed it three times (2019, 2021, 2024), so the guy clearly practices what he preaches.

Google Cloud has three certification levels: Foundational (no experience needed), Associate (6+ months), and Professional (1+ years of GCP, 3+ years of industry experience). For data engineering specifically, there’s only one option right now: the Professional Data Engineer certification.

Here’s what’s interesting. The exam covers five sections:

  • Designing data processing systems
  • Ingesting and processing data
  • Storing data
  • Preparing and using data for analysis
  • Maintaining and automating data workloads

The book covers a solid chunk of these topics through 12 chapters. But the exam is broader. There are GCP services the book didn’t cover in detail: Bigtable, Spanner, Datastore, Memorystore, Cloud Data Fusion, Dataprep, Cloud Logging, Cloud Monitoring, Looker, Data Transfer Service, and Analytics Hub.

So the book gets you 70-80% of the way there. The remaining 20-30% you need to study on your own.

The Services You Need to Know Beyond This Book

The author gives quick overviews of each extra service. Let me summarize what matters for the exam.

Choosing a database is one of the most common exam topics. The decision tree is straightforward: if your data is under 1 TB, pick Cloud SQL (structured) or Datastore (semi-structured/unstructured). If your data is over 1 TB, pick Spanner (structured/semi-structured) or Bigtable (unstructured, high throughput, low latency). Bigtable is the one that gets the most exam questions.

Bigtable is basically managed HBase. It’s for time series data, IoT, transaction histories. Key thing to remember: storage and compute are separate, so if a node goes down, you don’t lose data. And watch out for key hotspots. Using a plain timestamp or domain name as a row key is a bad idea. Combine columns or reverse domain names instead.

Memorystore is managed Redis/Memcached. It’s for caching. Not for analytics. If you need extremely fast reads on a subset of data, export it from Cloud SQL to Memorystore.

Cloud Data Fusion is a GUI-based ETL tool built on CDAP. Think of it as Cloud Composer for people who prefer clicking over coding.

Dataprep is for data analysts who need to clean data without writing code. Different audience than Data Fusion.

Data Transfer Service depends on size. Under 1 TB? Use gsutil. Over 1 TB? Storage Transfer Service. Over 100 TB? Transfer Appliance (a physical box they ship to you).

Cloud Logging and Monitoring are built-in. Logging captures everything from GCP services. Monitoring works through metrics. For the exam, know which metrics exist for Dataflow and BigQuery.

The Quiz: Testing What You Know

The book includes 12 sample questions in the certification format. I won’t repeat all of them here, but the pattern is worth understanding. Each question gives you a scenario and four options. Usually one option is clearly wrong, one is technically possible but too complicated, and two seem reasonable. You need to pick the best one.

A few examples of the thinking process:

  • When the question mentions Spark jobs and quick migration, the answer is Dataproc. Not because other approaches are impossible, but because they require unnecessary effort.
  • When you need to find phone numbers across thousands of BigQuery tables with minimum effort, DLP/SDP Discovery is the answer. You could write a Dataflow job with regex, but that’s reinventing the wheel.
  • When a Looker Studio report shows stale data, the answer is clicking the Refresh Data button to clear the BigQuery cache. Not clearing browser cookies, not recreating tables.

The exam rewards you for knowing the simplest correct approach, not the most technically impressive one. That’s actually a useful career skill too.

Where Data Engineering Came From and Where It’s Going

This section is my favorite part of the chapter because the author shares his actual perspective on the industry.

The past: data engineers didn’t exist as a job title. They were called ETL developers, data modelers, database admins. They used proprietary tools on on-premises servers. Each tool had its own ecosystem and best practices. The field was fragmented.

The present: data engineering has become a mature, recognized role. Interview demand for data engineers went up 40% in a single year (2020 data from interviewquery.com). Big data and cloud are no longer future concepts. They are baseline expectations. If you’re looking for a data engineering job, you need to know both. Full stop.

The future: two things are going to happen.

First, cloud adoption is still far from its peak. Startups already use cloud by default. But traditional companies like banks, government agencies, they’re still migrating. When they fully commit, demand for cloud data engineers will grow even more. And the generative AI wave is doing for data engineering what machine learning hype did in 2014: it’s making everyone realize they need good data foundations.

Second, the “data engineer” title will split into more specific roles. Right now, “data engineer” can mean anything from writing SQL transformations to managing Kafka clusters to designing data governance frameworks. That’s too broad. Companies will start hiring for narrower, more defined roles.

And here’s the big shift: SQL is spreading beyond engineers. Marketing teams, HR departments, C-level executives are learning SQL. The rise of the analytics engineer role (check dbt’s blog on this) is a sign of things to come. Data engineers will spend less time writing transformation queries and more time building the data foundation: architecture, governance, pipelines, security.

In ELT terms, data engineers will still own the E and L. But the T will increasingly belong to the business teams who understand the domain context better.

What I Think

Having spent 20+ years in IT, I’ve watched this pattern play out across multiple fields. A new role appears, it’s broad and undefined, everyone argues about what it means. Then the industry matures, the role splits, and specialists emerge. It happened with “webmaster” becoming front-end, back-end, DevOps. It’s happening with data engineering now.

The author’s advice about certification is practical and honest. He doesn’t claim the cert makes you a great engineer. He says the preparation process fills gaps in your knowledge. That’s a healthy perspective.

The exam tips are also telling. The right answer is usually the simplest correct approach, not the most elaborate one. That’s a principle that applies way beyond exam questions. In real data engineering work, the team that picks the simplest tool that solves the problem usually wins. Not the team that picks the coolest tool.

One thing I’ll add from my own experience: the future the author describes, where non-engineers write SQL and data engineers focus on foundations, is already happening in many organizations. If you’re a data engineer who only writes SQL transformations, start learning about data architecture and governance now. That’s where the role is heading.


This is part of my retelling of “Data Engineering with Google Cloud Platform” by Adi Wijaya. Go back to Chapter 12 Part 2: CI/CD Pipelines or continue to Chapter 13 Part 2: Certifications and Next Steps.

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More