Book retelling

Data Engineering With AWS Chapter 7 Part 2: Transforming Data - Optimization and Business Logic

This is post 12 in my Data Engineering with AWS retelling series.

In Part 1, we covered the generic data preparation transforms: converting to Parquet, partitioning, PII protection, and data cleansing. Those transforms work on individual datasets and do not need much business context. Now we get to the transforms that actually create business value. The ones that combine multiple datasets, add context, flatten structures, and produce the tables that analysts and dashboards consume.

Data Engineering for Beginners - Closing Thoughts on the Full Series

And that’s it. Eighteen posts. Thirteen chapters. One complete walkthrough of “Data Engineering for Beginners” by Chisom Nwokwu.

When I started this series, I said I wanted to retell the book in my own words. Not a summary, not a copy. My take on what each chapter covers and why it matters. Now that I’m at the end, let me step back and share my overall impressions.

Final Thoughts on Data Science Foundations by Mariadas and Huke

Nineteen posts. Sixteen chapters. One book. And here we are at the end.

When I started this retelling of Data Science Foundations: Navigating Digital Insight by Stephen Mariadas and Ian Huke (ISBN: 978-1-78017-6994, BCS 2025), I was not sure how it would go. Some books lose steam halfway. Some start strong and fizzle. But this one stayed consistent from first chapter to last.

Final Thoughts on Python and R for the Modern Data Scientist

So we made it through the whole book. And honestly? It was worth the ride.

What This Book Got Right

The biggest thing Scavetta and Angelov got right is the framing. They didn’t write a “Python is better” or “R is better” book. They wrote a “both are useful, here’s when to use which” book. And that’s the mature take.

Data Security for Data Engineers - Chapter 9 Retelling

In 2016, hackers stole personal data of 57 million Uber users and drivers. How? Someone left API credentials in a private GitHub repo. The attackers grabbed those keys, got into AWS, and downloaded everything. Uber didn’t even notice for a year. When they finally found out, they paid the hackers $100,000 to delete the data and kept quiet about it.

When to Use Python vs R - Data Format Context Explained

Chapter 4 is where the book stops teaching you the languages and starts telling you when to use which one. This is Part III, “The Modern Context,” and Boyan Angelov takes the lead here. The question is simple: given a specific data format, which language gives you a better experience?

Pipeline Orchestration With Airflow, DAGs, and Data Transformations

This is Part 2 of Chapter 7, continuing from batch and streaming basics.

In Part 1, we covered how batch and streaming pipelines move data around. But here is the thing: having a pipeline is one thing. Making sure all its parts run in the right order, at the right time, without you babysitting it? That is orchestration. And this is where Chapter 7 gets really practical.

Data Pipelines: Batch vs Streaming and When to Use Each

This is Part 1 of Chapter 7. Part 2 covers orchestration and transformations.

Chapter 7 of Data Engineering for Beginners is probably where things start feeling real. You stop talking about storage and tables and start talking about how data actually moves. And the answer is: through pipelines.

NiFi Registry Version Control - Study Notes From Data Engineering With Python Ch 8

You’ve been building data pipelines for several chapters now. They work. They move data. But here’s the problem: none of them have version control. If you break something, there’s no going back. Chapter 8 of Data Engineering with Python by Paul Crickard fixes that. It introduces the NiFi Registry, a sub-project of Apache NiFi that handles version control for your data pipelines.

The Origin Stories of Python and R - Chapter 1 Retelling

Chapter 1 is titled “In the Beginning” and it’s written by Rick Scavetta. He opens with a tongue-in-cheek Dickens reference, saying it’s just the best of times for data science. But to understand where we are, we need to look at where Python and R came from. Their origin stories explain why they feel so different today.

Data Engineering With GCP Chapter 7: Making Data Visual With Looker Studio

You spend weeks building pipelines, modeling data, setting up orchestration. Everything works. Data lands in BigQuery clean and on time. And then someone from the business side asks: “So… where do I see the numbers?” That is exactly where Chapter 7 picks up. All that upstream work has to end somewhere useful, and for most organizations that somewhere is a dashboard.

Data Engineering With GCP Chapter 6 Part 1: Real-Time Data With Pub/Sub

Chapter 6 is where Adi Wijaya switches gears from batch to real-time. After spending Chapters 3 through 5 on batch pipelines with BigQuery, Cloud Composer, and Dataproc, now it is time to talk about streaming data. Two GCP services carry this chapter: Pub/Sub and Dataflow. This post covers the streaming concepts and Pub/Sub. Dataflow gets its own post in Part 2.

Data Science Foundations Chapter 5: The Discovery Phase and Asking the Right Questions

You got a data science project. Great. But before you touch any data, before you write a single line of code, you need to stop and think. That is what Chapter 5 of “Data Science Foundations” by Stephen Mariadas and Ian Huke is about. The discovery phase. The part most people want to skip. And it is the part that saves you from wasting months on something that never had a chance.

SQL Basics: SELECT, WHERE, and Aggregate Functions

This is Part 1 of Chapter 4. Part 2 covers joins and advanced queries.

Chapter 4 is where Nwokwu puts SQL in your hands. No more theory. You write queries, you get results, you learn by doing. If Chapter 3 was about understanding what databases are, this chapter is about talking to them.

Data Engineering With AWS Chapter 6 Part 1: Ingesting Batch Data

This is post 9 in my Data Engineering with AWS retelling series.

You have your whiteboard architecture from Chapter 5. You know who your data consumers are and what they need. Now it is time to actually move data. Chapter 6 covers data ingestion – getting data from wherever it lives into your AWS data lake. This first part focuses on batch ingestion from databases and files. Part 2 covers streaming.

Data Engineering With GCP Chapter 1: What Is Data Engineering Anyway?

Chapter 1 starts with a confession most of us in the data world can relate to. Adi Wijaya says he used to think data was clean. Neatly organized, ready to go. Then he actually worked with data in real organizations and realized most of the effort goes into collecting, cleaning, and transforming it. Not the fun machine learning part. The plumbing part.

Data Engineering With AWS Chapter 1: What Even Is Data Engineering?

If someone told you twenty years ago that data would become more valuable than oil, you would have laughed. But here we are. The most valuable companies on the planet are not drilling for crude. They are collecting, processing, and squeezing insights out of massive piles of data. And behind every one of those companies, there is a team of data engineers making it all work.

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More