Posts

Terraform Cert Guide Chapter 8: Understanding Terraform Configuration Files

Chapter 8 of Ravi Mishra’s book is one of those chapters that sounds basic on the surface but actually ties a lot of loose ends together. You’ve been writing Terraform code for seven chapters now, but this is where you stop and really understand the anatomy of a configuration file. What goes where, why it matters, and how the same patterns work across GCP, AWS, and Azure.

Golang DSA Chapter 9 Part 1: Graphs and Network Representation

Chapter 9 of “Learn Data Structures and Algorithms with Golang” by Bhagvan Kommadi shifts gears into graphs and network structures. If you’ve been following along, we spent the last few chapters on searching, sorting, and hashing. Now we’re getting into something that models the real world more directly: connections between things.

Golang DSA Chapter 8 Part 2: Searching, Recursion, and Hashing

Welcome back. In Part 1 we covered the sorting side of Chapter 8, from bubble sort all the way to quick sort. Now we’re picking up the second half: searching algorithms, recursion, and hashing. These are the tools you use when you already have your data and need to find stuff in it, or when you need to transform it for fast lookups.

Golang DSA Chapter 8 Part 1: Sorting Algorithms in Go

Chapter 8 of “Learn Data Structures and Algorithms with Golang” by Bhagvan Kommadi is called “Classic Algorithms.” It covers sorting, searching, recursion, and hashing. That’s a lot of ground, so we’re splitting it into two parts. This first part is all about sorting.

Golang DSA Chapter 7 Part 2: Sequences and Anti-Patterns

Welcome back. In Part 1 we went through dictionaries and TreeSets. This second half of Chapter 7 wraps up TreeSets with synchronized and mutable variants, then moves into some cool mathematical sequences implemented in Go. We also talk about common anti-patterns the book warns about when working with these data structures.

Golang DSA Chapter 7 Part 1: Dictionaries and TreeSets

We’re into Chapter 7 of “Learn Data Structures and Algorithms with Golang” by Bhagvan Kommadi, and this is where things get interesting. The chapter is about dynamic data structures, which are basically collections that can grow and shrink as needed. No fixed sizes, no guessing how much memory you need upfront.

Golang DSA Chapter 6 Part 1: Singly and Doubly Linked Lists

Chapter 6 of “Learn Data Structures and Algorithms with Golang” is all about heterogeneous data structures. That’s a fancy way of saying “data structures that can hold different types of data.” Think integers, floats, strings, whatever you need, all mixed together. Linked lists and ordered lists are the main examples here.

Golang DSA Chapter 5 Part 1: Arrays and Multi-Dimensional Arrays

Chapter 5 of “Learn Data Structures and Algorithms with Golang” by Bhagvan Kommadi shifts gears from trees and hash tables into something more math-heavy: homogeneous data structures. That basically means data structures where every element is the same type. Think arrays of integers, matrices of floats, that kind of thing.

Golang DSA Chapter 4 Part 1: Trees in Go

Up to this point in the book, everything we covered was linear. Lists, stacks, queues, heaps, all of them store data in a straight line. One element after another. Chapter 4 is where things get interesting because we’re moving into non-linear data structures.

Data Engineering With AWS Chapter 12: Visualizing Data With Amazon QuickSight

This is post 18 in my Data Engineering with AWS retelling series.

We have spent eleven chapters ingesting data, transforming data, cataloging data, querying data. But here is a simple truth: nobody wants to stare at 10,000 rows in a spreadsheet. Our brains are not built for that. We process pictures way faster than text. A well-designed chart can tell you in two seconds what would take twenty minutes to figure out from raw numbers.

Golang DSA Chapter 2 Part 2: Slices, Maps, and Go Patterns

Welcome back. In Part 1 we covered arrays, basic slices, two-dimensional slices, and maps. That was the foundation. Now Kommadi moves into the more interesting Go patterns: variadic functions, defer and panic, and a full CRUD web application that ties it all together. He also shows more advanced slice operations along the way.

Wrapping Up: Big Data on Kubernetes

We have reached the end of our deep dive into Big Data on Kubernetes by Neylson Crepalde. It has been a massive journey, moving from basic Docker containers to complex, real-time AI pipelines.

Beyond the Basics: The Kubernetes Ecosystem

We have built some incredible pipelines over the last few posts. But if you were to take what we’ve built and put it into production today, you’d quickly realize that there is a lot more to managing a platform than just getting the YAML files right.

Action Models With Bedrock Agents

In the last post, we saw how to give an AI model a “memory” using RAG. But the real game-changer in the Generative AI world is when you let the model actually do things.

Building an End-to-End Big Data Pipeline - Part 3

Batch processing is great for historical reports, but what if you need to know what’s happening right now? In the final part of Chapter 10, Neylson Crepalde shows us how to build a world-class Real-Time Pipeline on Kubernetes.

Building an End-to-End Big Data Pipeline - Part 1

We have spent the last few weeks looking at individual tools like Spark, Airflow, and Kafka. But in the real world, these tools don’t live in isolation. They need to talk to each other to form a complete data pipeline.

Data Engineering With AWS Chapter 9 Part 2: Bridging Data Lake and Data Warehouse

This is post 15 in my Data Engineering with AWS retelling series.

In Part 1, we looked at Redshift internals – clusters, slices, distribution styles, sort keys. All the pieces that make a data warehouse fast. But a warehouse sitting in isolation is not very useful. Data needs to flow in from your data lake, and sometimes it needs to flow back out. Part 2 of Chapter 9 covers that bridge between S3 and Redshift, including Redshift Spectrum, the COPY and UNLOAD commands, and a hands-on exercise that ties it all together.

Real-Time Visualization With Elasticsearch and Kibana

Trino is great for querying your historical data on S3, but for real-time streams and text-heavy search, you need something different. In the second half of Chapter 9, Neylson Crepalde introduces the industry standard for real-time analytics: Elasticsearch and Kibana.

The Data Consumption Layer - Querying With Trino

You’ve built your ingestion, you’ve processed your data with Spark, and it’s all sitting neatly in your S3 “Gold” bucket. Now what? You can’t ask every business analyst to learn PySpark just to see last month’s sales.

Blockchain and Banking: Why This Tech Is Actually a Big Deal

Ever feel like the banking world is just a bunch of old buildings and slow apps? Well, things are actually moving pretty fast behind the scenes. I just finished reading Blockchain and Banking: How Technological Innovations Are Shaping the Banking Industry by Pierluigi Martino, and it’s a real eye-opener.

Deploying the Big Data Stack on Kubernetes - Part 1

We’ve explored Spark, Airflow, and Kafka as individual tools. But the real goal of Neylson Crepalde’s book is to show you how to run them all as a cohesive “stack” on Kubernetes. In Chapter 8, we finally start the heavy lifting of deployment.

Real-Time Streaming With Apache Kafka - Part 2

Architecture is great, but let’s actually run some code. In the second half of Chapter 7, Neylson Crepalde walks us through setting up a multi-node Kafka cluster right on our local machine using Docker Compose.

Real-Time Streaming With Apache Kafka - Part 1

In the world of big data, “batch” is no longer enough. We need data the second it happens. Whether it’s tracking stock prices, monitoring website traffic, or detecting fraud, you need a system that can handle massive streams of events with zero downtime.

Orchestrating Pipelines With Apache Airflow - Part 1

If Spark is the engine, then Apache Airflow is the conductor. In a modern data stack, you rarely have just one job running in isolation. You have ingestion, cleaning, processing, and delivery—and they all have to happen in a specific order.

Distributed Processing With Apache Spark - Part 1

If there is one tool that defined the “Big Data” era, it’s Apache Spark. It’s the engine that handles everything from terabyte-scale ETL to complex machine learning. In Chapter 5, Neylson Crepalde breaks down exactly how Spark works and why it’s so powerful on Kubernetes.

The Tools of the Modern Data Stack

We’ve talked about the architecture, but what about the actual tools? To build a modern data lakehouse on Kubernetes, you need a specific set of tools that can handle scale, automation, and speed.

The Evolution of Data Architecture

We’ve all heard the terms “Data Warehouse” and “Data Lake,” but do you actually know why we keep switching between them? In Chapter 4 of Big Data on Kubernetes, Neylson Crepalde gives a masterclass on how data architecture has evolved to keep up with the modern world.

Scaling to the Cloud With Amazon EKS

Testing things locally with Kind is great, but big data usually needs big iron. In this part of the hands-on journey, Neylson Crepalde shows us how to scale up to a managed cloud environment.

Local Kubernetes With Kind

Reading about architecture is one thing, but actually seeing a cluster run is where it sticks. In the third chapter of Big Data on Kubernetes, Neylson Crepalde moves from theory to practice.

Decoding Kubernetes Architecture - Part 1

If you want to run big data workloads on Kubernetes, you have to understand how the system is actually put together. It’s not just “magic magic cloud stuff”—it’s a carefully coordinated cluster of machines.

Building Your Own Data Images

In my last post, we talked about why containers are the bedrock of modern data engineering. But honestly, just running other people’s images only gets you so far. The real magic happens when you start packaging your own custom code.

Why Containers Are a Must for Data Engineers

If you are working with data today, you can’t really ignore containers. They have become the standardized unit for how we develop, ship, and deploy software. But why do we care so much about them in the big data world?

Rethinking Data Infrastructure: Big Data on Kubernetes

We are living in a world where data is basically everywhere. From your phone to social media and every single online purchase, the amount of info we generate is staggering. But here’s the thing: just having data isn’t enough. You have to be able to process it, and that’s where things get complicated.

Data Engineering With AWS Chapter 7 Part 2: Transforming Data - Optimization and Business Logic

This is post 12 in my Data Engineering with AWS retelling series.

In Part 1, we covered the generic data preparation transforms: converting to Parquet, partitioning, PII protection, and data cleansing. Those transforms work on individual datasets and do not need much business context. Now we get to the transforms that actually create business value. The ones that combine multiple datasets, add context, flatten structures, and produce the tables that analysts and dashboards consume.

Data Engineering for Beginners - Closing Thoughts on the Full Series

And that’s it. Eighteen posts. Thirteen chapters. One complete walkthrough of “Data Engineering for Beginners” by Chisom Nwokwu.

When I started this series, I said I wanted to retell the book in my own words. Not a summary, not a copy. My take on what each chapter covers and why it matters. Now that I’m at the end, let me step back and share my overall impressions.

Final Thoughts on Data Science Foundations by Mariadas and Huke

Nineteen posts. Sixteen chapters. One book. And here we are at the end.

When I started this retelling of Data Science Foundations: Navigating Digital Insight by Stephen Mariadas and Ian Huke (ISBN: 978-1-78017-6994, BCS 2025), I was not sure how it would go. Some books lose steam halfway. Some start strong and fizzle. But this one stayed consistent from first chapter to last.

Final Thoughts on Python and R for the Modern Data Scientist

So we made it through the whole book. And honestly? It was worth the ride.

What This Book Got Right

The biggest thing Scavetta and Angelov got right is the framing. They didn’t write a “Python is better” or “R is better” book. They wrote a “both are useful, here’s when to use which” book. And that’s the mature take.

Data Security for Data Engineers - Chapter 9 Retelling

In 2016, hackers stole personal data of 57 million Uber users and drivers. How? Someone left API credentials in a private GitHub repo. The attackers grabbed those keys, got into AWS, and downloaded everything. Uber didn’t even notice for a year. When they finally found out, they paid the hackers $100,000 to delete the data and kept quiet about it.

When to Use Python vs R - Data Format Context Explained

Chapter 4 is where the book stops teaching you the languages and starts telling you when to use which one. This is Part III, “The Modern Context,” and Boyan Angelov takes the lead here. The question is simple: given a specific data format, which language gives you a better experience?

Pipeline Orchestration With Airflow, DAGs, and Data Transformations

This is Part 2 of Chapter 7, continuing from batch and streaming basics.

In Part 1, we covered how batch and streaming pipelines move data around. But here is the thing: having a pipeline is one thing. Making sure all its parts run in the right order, at the right time, without you babysitting it? That is orchestration. And this is where Chapter 7 gets really practical.

Data Pipelines: Batch vs Streaming and When to Use Each

This is Part 1 of Chapter 7. Part 2 covers orchestration and transformations.

Chapter 7 of Data Engineering for Beginners is probably where things start feeling real. You stop talking about storage and tables and start talking about how data actually moves. And the answer is: through pipelines.

NiFi Registry Version Control - Study Notes From Data Engineering With Python Ch 8

You’ve been building data pipelines for several chapters now. They work. They move data. But here’s the problem: none of them have version control. If you break something, there’s no going back. Chapter 8 of Data Engineering with Python by Paul Crickard fixes that. It introduces the NiFi Registry, a sub-project of Apache NiFi that handles version control for your data pipelines.

The Origin Stories of Python and R - Chapter 1 Retelling

Chapter 1 is titled “In the Beginning” and it’s written by Rick Scavetta. He opens with a tongue-in-cheek Dickens reference, saying it’s just the best of times for data science. But to understand where we are, we need to look at where Python and R came from. Their origin stories explain why they feel so different today.

Data Engineering With GCP Chapter 7: Making Data Visual With Looker Studio

You spend weeks building pipelines, modeling data, setting up orchestration. Everything works. Data lands in BigQuery clean and on time. And then someone from the business side asks: “So… where do I see the numbers?” That is exactly where Chapter 7 picks up. All that upstream work has to end somewhere useful, and for most organizations that somewhere is a dashboard.

Data Engineering With GCP Chapter 6 Part 1: Real-Time Data With Pub/Sub

Chapter 6 is where Adi Wijaya switches gears from batch to real-time. After spending Chapters 3 through 5 on batch pipelines with BigQuery, Cloud Composer, and Dataproc, now it is time to talk about streaming data. Two GCP services carry this chapter: Pub/Sub and Dataflow. This post covers the streaming concepts and Pub/Sub. Dataflow gets its own post in Part 2.

Data Science Foundations Chapter 5: The Discovery Phase and Asking the Right Questions

You got a data science project. Great. But before you touch any data, before you write a single line of code, you need to stop and think. That is what Chapter 5 of “Data Science Foundations” by Stephen Mariadas and Ian Huke is about. The discovery phase. The part most people want to skip. And it is the part that saves you from wasting months on something that never had a chance.

SQL Basics: SELECT, WHERE, and Aggregate Functions

This is Part 1 of Chapter 4. Part 2 covers joins and advanced queries.

Chapter 4 is where Nwokwu puts SQL in your hands. No more theory. You write queries, you get results, you learn by doing. If Chapter 3 was about understanding what databases are, this chapter is about talking to them.

Data Engineering With AWS Chapter 6 Part 1: Ingesting Batch Data

This is post 9 in my Data Engineering with AWS retelling series.

You have your whiteboard architecture from Chapter 5. You know who your data consumers are and what they need. Now it is time to actually move data. Chapter 6 covers data ingestion – getting data from wherever it lives into your AWS data lake. This first part focuses on batch ingestion from databases and files. Part 2 covers streaming.

Data Engineering With GCP Chapter 1: What Is Data Engineering Anyway?

Chapter 1 starts with a confession most of us in the data world can relate to. Adi Wijaya says he used to think data was clean. Neatly organized, ready to go. Then he actually worked with data in real organizations and realized most of the effort goes into collecting, cleaning, and transforming it. Not the fun machine learning part. The plumbing part.

Data Engineering With AWS Chapter 1: What Even Is Data Engineering?

If someone told you twenty years ago that data would become more valuable than oil, you would have laughed. But here we are. The most valuable companies on the planet are not drilling for crude. They are collecting, processing, and squeezing insights out of massive piles of data. And behind every one of those companies, there is a team of data engineers making it all work.

Reading the Room: Stock Sentiment Analysis With NLP

Stocks aren’t just driven by math; they’re driven by people. And people are emotional. In Chapter 14 of Data Analytics for Finance Using Python, we look at Natural Language Processing (NLP)—a way to turn human chatter into useful data.

Systems Thinking Chapter 11: Systems Leadership

Chapter 11 is about leadership. But not the kind you see on LinkedIn where someone posts a sunset photo and writes “leaders eat last.” Diana is talking about something very different. Systems leadership is about improving how knowledge flows through your organization. Not about your title, not about your authority, not about how many people report to you.

Standing Out From the Mean: Assessing Stock Risk With the Z-Score

If you’ve ever heard someone say a stock’s price is “three standard deviations away from the mean,” they’re talking about Z-Scores. In Chapter 11 of Data Analytics for Finance Using Python, we explore how to use this tool to find the “weird” data points that might actually be opportunities.

Systems Thinking Chapter 10: Modeling Together - Part 1

Chapter 10 is a big one, so I’m splitting it into two parts. This is Part 1 of 2.

Diana opens with a Donella Meadows quote that sets the tone for everything that follows: get your model out where people can see it, invite others to challenge it. That’s the whole chapter in one sentence, really. But of course there’s much more to unpack.

Which One Is Riskier? Assessing Stock Risk With the F-Test

If you’re choosing between two stocks, you don’t just want to know which one has a higher return. you want to know which one is more likely to give you a heart attack. In Chapter 9 of Data Analytics for Finance Using Python, we look at the F-Test as a way to compare risk.

Systems Thinking Chapter 8: Designing Feedback Loops

When you hear “feedback loop” you probably think about monitoring dashboards. Or autoscaling. Or maybe that annoying annual performance review your manager gives you. Diana Montalion says all of that is too narrow. Chapter 8 is about feedback loops for thinking. Not for servers.

Big Data for the Rest of Us: A Deep Look at Hadoop 3

So, you’ve heard about big data. It’s everywhere. But how do you actually handle it? If you’re looking for the OG of big data platforms, you’re looking at Hadoop. And honestly, it’s still the foundation for almost everything we do in data today.

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More