Book retelling

Mar 10, 2026
software-engineering

Data Engineering With AWS Chapter 8: Who Actually Uses All This Data?

This is post 13 in my Data Engineering with AWS retelling series.

We have spent the last several chapters ingesting data, transforming data, optimizing data. Pipelines everywhere. But here is the question nobody asks often enough: who is actually going to use all of this?

Mar 03, 2026
software-engineering

Data Engineering With AWS Chapter 7 Part 2: Transforming Data - Optimization and Business Logic

This is post 12 in my Data Engineering with AWS retelling series.

In Part 1, we covered the generic data preparation transforms: converting to Parquet, partitioning, PII protection, and data cleansing. Those transforms work on individual datasets and do not need much business context. Now we get to the transforms that actually create business value. The ones that combine multiple datasets, add context, flatten structures, and produce the tables that analysts and dashboards consume.

Feb 28, 2026
software-engineering

Building NLP and LLM Pipelines - Final Thoughts on the Book

And that’s a wrap. Over the past 24 days, we walked through every chapter of Laura Funderburk’s Building Natural Language and LLM Pipelines. Here are my final thoughts on the book as a whole.

Feb 28, 2026
software-engineering

Data Engineering for Beginners - Closing Thoughts on the Full Series

And that’s it. Eighteen posts. Thirteen chapters. One complete walkthrough of “Data Engineering for Beginners” by Chisom Nwokwu.

When I started this series, I said I wanted to retell the book in my own words. Not a summary, not a copy. My take on what each chapter covers and why it matters. Now that I’m at the end, let me step back and share my overall impressions.

Feb 28, 2026
software-engineering

Data Engineering With GCP: Final Thoughts and Key Takeaways

Twenty-two posts later, we are done. This was my retelling of “Data Engineering with Google Cloud Platform” by Adi Wijaya (2nd edition, Packt Publishing, 2024, ISBN 978-1-83508-011-5). Time to look back, share what stuck with me, and give an honest assessment.

Feb 28, 2026
software-engineering

Data Engineering With Python: Final Thoughts and Takeaways

That’s it. Fifteen chapters, seventeen posts, and one complete walkthrough of Paul Crickard’s Data Engineering with Python (Packt, 2020, ISBN: 978-1-83921-418-9).

Feb 28, 2026
software-engineering

Final Thoughts on Data Science Foundations by Mariadas and Huke

Nineteen posts. Sixteen chapters. One book. And here we are at the end.

When I started this retelling of Data Science Foundations: Navigating Digital Insight by Stephen Mariadas and Ian Huke (ISBN: 978-1-78017-6994, BCS 2025), I was not sure how it would go. Some books lose steam halfway. Some start strong and fizzle. But this one stayed consistent from first chapter to last.

Feb 28, 2026
software-engineering

Final Thoughts on Python and R for the Modern Data Scientist

So we made it through the whole book. And honestly? It was worth the ride.

What This Book Got Right

The biggest thing Scavetta and Angelov got right is the framing. They didn’t write a “Python is better” or “R is better” book. They wrote a “both are useful, here’s when to use which” book. And that’s the mature take.

Feb 27, 2026
software-engineering

Building a Career in Data Engineering - Roles, Resumes, and Interviews

This is the last technical chapter of the book. Everything before this was about skills, tools, and concepts. Chapter 13 is about what you do with all of that knowledge. How you actually get a job in data engineering.

Feb 27, 2026
software-engineering

Data Engineering With GCP Chapter 13 Part 2: GCP Certifications and Career Next Steps

In Part 1 we went through the quiz questions, extra GCP services, and how the book ties everything together. Now let’s talk about the stuff that matters after you close the book: getting certified, planning your career, and figuring out what comes next.

Feb 27, 2026
software-engineering

Data Science Foundations Chapter 16: Where Data Science Goes From Here

So we made it. Chapter 16 is the conclusion of Data Science Foundations by Stephen Mariadas and Ian Huke. And like most good conclusions, it does not introduce anything new. Instead it steps back and asks: what did we learn, and where is all of this going?

Feb 27, 2026
software-engineering

Python and R Translation Cheat Sheet - Best Equivalents

The appendix of “Python and R for the Modern Data Scientist” is basically a bilingual dictionary. It runs about 40 tables long and covers everything from package management to indexing. You could spend a whole afternoon reading through it.

Feb 27, 2026
software-engineering

Real-Time Edge Data With MiNiFi and Spark - Study Notes From Data Engineering With Python Ch 15

You have NiFi running. Kafka is streaming. Spark is processing. But what about the data source? What happens when your data comes from a tiny sensor or a Raspberry Pi that can barely run a web browser?

Feb 27, 2026
software-engineering

Token Economics, System Integrity Under Failure, and the Sovereign Agent Stack

In the previous post, we looked at how agentic architectures evolved from brittle sequential chains (V1) through router patterns (V2) to resilient supervisors (V3). Now Funderburk puts those architectures under stress. The results are honestly a little scary.

Feb 26, 2026
software-engineering

Agentic AI Architecture: From Monolithic Scripts to Resilient Supervisors

The epilogue of Funderburk’s book is where everything clicks together. All the individual skills from earlier chapters (pipelines, RAG, tool contracts, Haystack components, LangGraph orchestration) get assembled into a single architectural argument. And that argument is surprisingly clear: separate the doing from the thinking.

Feb 26, 2026
software-engineering

Cloud Data Engineering - Storage, Compute, Networking, and Cost on the Cloud

Chapter 12 is the one where everything moves to the cloud. If you’ve been following along, we’ve been talking about databases, pipelines, data quality, security, governance, and big data. All of that can run on your own hardware. But most teams today don’t do that. They use cloud providers. This chapter explains why, and more importantly, how.

Feb 26, 2026
software-engineering

Data Engineering With GCP Chapter 13 Part 1: Growing Your Confidence as a Data Engineer

Chapter 13 is the last chapter in the book, and it’s different from everything that came before. No new GCP services, no hands-on exercises, no Terraform scripts. Instead, Adi Wijaya steps back and talks about the bigger picture: certifications, where data engineering is heading, and how to actually feel confident in this role.

Feb 26, 2026
software-engineering

Data Processing With Apache Spark - Study Notes From Data Engineering With Python Ch 14

You have streaming data. You have batch data. You have a lot of it. Now you need to actually process it. Fast. On more than one machine.

Feb 26, 2026
software-engineering

Data Science Foundations Chapter 15: Real Companies Using Data Science Right Now

Theory is nice. But does any of it work in the real world? Chapter 15 of Data Science Foundations by Stephen Mariadas and Ian Huke answers that with five case studies. Real people, real problems. One of them technically failed. And that is part of the point.

Feb 26, 2026
software-engineering

Real World Bilingual Data Science - A Python and R Case Study

The whole book has been building to this. Six chapters of philosophy, syntax comparisons, and interoperability tricks. Now Chapter 7 drops a real project on the table. Build it with both languages. Together. Start to finish.

Feb 25, 2026
software-engineering

Big Data and Distributed Systems - Chapter 11 Retelling

At some point, your data gets too big for one machine. That’s not a hypothetical. Netflix, Google, Amazon, they all hit that wall years ago. The question is: what do you do when a single server can’t keep up?

Feb 25, 2026
software-engineering

Data Engineering With GCP Chapter 12 Part 2: Building CI/CD Pipelines on Google Cloud

In Part 1 we covered the theory behind CI/CD and ran through a basic Cloud Build exercise with a Python project. Unit tests ran automatically on every push. Broken code got caught before it reached production. Good stuff, but that was a simple calculator script. Now we need to connect this to real data engineering work.

Feb 25, 2026
software-engineering

Data Science Foundations Chapter 14: Machine Learning and AI Explained Simply

Everyone has an opinion about AI. Your coworker worries robots will take his job. Your cousin swears ChatGPT wrote his college essay. Chapter 14 of Data Science Foundations by Stephen Mariadas and Ian Huke explains what machine learning and artificial intelligence actually are. And how data science connects to all of it.

Feb 25, 2026
software-engineering

MCP, A2A Protocol, Agentic Context Engineering, and the Future of AI Interoperability

In the first half of Chapter 9, Funderburk covered hardware limitations and the four big problems with LLMs. Now she gets to the good stuff: the protocols and frameworks that are actually solving those problems.

Feb 25, 2026
software-engineering

Streaming Data With Apache Kafka - Study Notes From Data Engineering With Python Ch 13

Up to this point in the book, data pipelines have been about moving data that already exists. Query a database, read a file, process it, store it. The data sits still and you go get it.

Feb 25, 2026
software-engineering

Using Python and R Together - Tools for Bilingual Data Science

Chapter 6 is where the book finally delivers on its promise. All that talk about using both languages together? This is where it actually happens. Rick Scavetta walks through the nuts and bolts of making Python and R talk to each other in the same project.

Feb 24, 2026
software-engineering

Building a Kafka Cluster - Study Notes From Data Engineering With Python Ch 12

Up to this point in the book, everything has been batch processing. You query a database, get a full dataset, transform it, load it somewhere. The data sits still while you work on it.

Feb 24, 2026
software-engineering

Data Engineering With AWS Chapter 7 Part 1: Transforming Data - The Basics

This is post 11 in my Data Engineering with AWS retelling series.

You have data sitting in your data lake. Raw CSV files, JSON dumps, database extracts. It is all there, technically available, but trying to run analytics on it is painfully slow and expensive. This chapter is about fixing that. Transforming raw data into something optimized, clean, and ready for actual use.

Feb 24, 2026
software-engineering

Data Engineering With GCP Chapter 12 Part 1: CI/CD Basics for Data Engineers

Chapter 12 is a shift from everything we have done so far. Until now, we were learning how to build things: pipelines, data lakes, warehouses, streaming systems. Now the question is: how do you ship all that stuff to production without breaking things? The answer is CI/CD.

Feb 24, 2026
software-engineering

Data Governance Explained - Chapter 10 Retelling

Data governance sounds like something a committee of suits invented to make your life harder. But here’s the thing: without it, everything falls apart quietly.

Feb 24, 2026
software-engineering

Data Science Foundations Chapter 13: Telling the Story Behind Your Data

You did the hard work. You collected data, cleaned it, tested your models. And now you need to tell someone what you found. This is where Chapter 13 of “Data Science Foundations” by Stephen Mariadas and Ian Huke comes in. Communication. The part that separates useful data science from data science that nobody cares about.

Feb 24, 2026
software-engineering

Hardware Limits, NVIDIA NIMs, Edge Deployment, and Why LLMs Still Struggle

Chapter 9 of Laura Funderburk’s book takes a step back from building things and looks forward. What’s coming next for NLP and LLM systems? Where are the bottlenecks? What’s changing?

Feb 24, 2026
software-engineering

Python vs R Workflows - Machine Learning, Visualization, and More

Chapter 5 is where Boyan Angelov gets practical about the question everyone dances around: which language should you actually use for which job?

Feb 23, 2026
software-engineering

Building a Production Data Pipeline - Study Notes From Data Engineering With Python Ch 11

You learned the individual tools. You learned the deployment strategies. Now Chapter 11 of Data Engineering with Python by Paul Crickard puts it all together. This is the chapter where you build a complete, production-grade data pipeline from start to finish.

Feb 23, 2026
software-engineering

Building the Yelp Navigator: Multi-Agent Orchestration With LangGraph, Haystack Microservices, and Supervisor Approval

This is where everything from Chapter 8 comes together. We’ve built NER pipelines, text classification tools, sentiment analyzers. Now Funderburk wires them into a multi-agent graph that can handle complex queries end to end.

Feb 23, 2026
software-engineering

Data Engineering With GCP Chapter 11: Keeping Google Cloud Costs Under Control

Nobody ever got promoted for building the cheapest data pipeline. But plenty of people have gotten uncomfortable phone calls from their CFO after a runaway BigQuery bill. Chapter 11 is about the money side of GCP, and I think this is one of the most practical chapters in the book.

Feb 23, 2026
software-engineering

Data Science Foundations Chapter 12: How to Know if Your Model Actually Works

You built a model. It runs. It gives you numbers. But does it actually work? That is what Chapter 12 of “Data Science Foundations” by Stephen Mariadas and Ian Huke is about. Building a model is one thing. Trusting it is something else.

Feb 23, 2026
software-engineering

Data Security for Data Engineers - Chapter 9 Retelling

In 2016, hackers stole personal data of 57 million Uber users and drivers. How? Someone left API credentials in a private GitHub repo. The attackers grabbed those keys, got into AWS, and downloaded everything. Uber didn’t even notice for a year. When they finally found out, they paid the hackers $100,000 to delete the data and kept quiet about it.

Feb 23, 2026
software-engineering

When to Use Python vs R - Data Format Context Explained

Chapter 4 is where the book stops teaching you the languages and starts telling you when to use which one. This is Part III, “The Modern Context,” and Boyan Angelov takes the lead here. The question is simple: given a specific data format, which language gives you a better experience?

Feb 22, 2026
software-engineering

Data Engineering With GCP Chapter 10 Part 2: Data Quality, Security, and Compliance

In Part 1 we covered how data governance breaks into three pillars (usability, security, accountability) and went through metadata, Dataplex search, access control in BigQuery, and the Sensitive Data Protection service for finding PII. Now let’s pick up where we left off: understanding what SDP actually finds, and then moving into the accountability pillar.

Feb 22, 2026
software-engineering

Data Quality: What Bad Data Looks Like and How to Catch It

Chapter 8 of Data Engineering for Beginners opens with a statement that should be obvious but apparently is not: even the best pipelines and storage systems are meaningless if the data they deliver is garbage.

Feb 22, 2026
software-engineering

Data Science Foundations Chapter 11: Making Data Visual and Easy to Understand

You ran the analysis. Got your numbers. Built a model. Now you need to show people what you found. And here is where most data people trip up. They pick the wrong chart, overload it with details, and the audience walks away confused.

Feb 22, 2026
software-engineering

Deploying Data Pipelines - Study Notes From Data Engineering With Python Ch 10

You built your data pipelines. They work on your laptop. Now what? Chapter 10 of Data Engineering with Python by Paul Crickard covers the part everyone eventually has to face: getting your pipelines out of development and into production.

Feb 22, 2026
software-engineering

Python for R Users - Versions, Virtual Environments, and Pandas

Chapter 2 showed Pythonistas how to pick up R. Chapter 3 flips the script. Now it’s the R user’s turn to step into Python territory. Rick Scavetta writes this one, and he does a good job easing R folks into a world that feels messier at first glance.

Feb 22, 2026
software-engineering

Sentiment Analysis Pipelines and Multi-Agent Architecture Design With Haystack and LangGraph

After NER and text classification, Funderburk moves to the third building block: sentiment analysis. Then she starts putting all the pieces together into a multi-agent architecture. This is where the chapter gets really interesting.

Feb 21, 2026
software-engineering

Data Engineering With GCP Chapter 10 Part 1: Data Governance Basics on Google Cloud

Data governance is one of those topics that sounds boring until you realize nobody can find anything in your data platform. Then it becomes very interesting very fast.

Feb 21, 2026
software-engineering

Data Science Foundations Chapter 10 Part 2: Time Series, Classification, and Clustering Models

This is Part 2 of 2 for Chapter 10. In Part 1 we covered how to pick the right model and looked at regression. Now we get into the rest: time series, classification, clustering, and association analysis.

Feb 21, 2026
software-engineering

Hands-on NER Pipelines and Text Classification With Haystack: From Monolithic to Tool-Based Architecture

Chapter 8 is where Funderburk says: enough with single pipelines. Time to build tools. And then make an agent pick which tool to use.

Feb 21, 2026
software-engineering

Monitoring Data Pipelines - Study Notes From Data Engineering With Python Ch 9

You built a data pipeline. It is idempotent, uses atomic transactions, and has version control. It is production ready. But can you tell when it breaks?

Feb 21, 2026
software-engineering

Pipeline Orchestration With Airflow, DAGs, and Data Transformations

This is Part 2 of Chapter 7, continuing from batch and streaming basics.

In Part 1, we covered how batch and streaming pipelines move data around. But here is the thing: having a pipeline is one thing. Making sure all its parts run in the right order, at the right time, without you babysitting it? That is orchestration. And this is where Chapter 7 gets really practical.

Feb 21, 2026
software-engineering

R for Python Developers - Lists, Factors, and Data Wrangling

In Part 1 we covered R basics: setting up your environment, installing packages, working with tibbles, and understanding R’s type system. Now we get to the good stuff. Lists, factors, finding things in your data, and the iteration patterns that make R feel so different from Python.

Feb 20, 2026
software-engineering

CI/CD, Pipeline Serialization, and Hayhooks for Zero-Boilerplate Deployment - Chapter 7 Part 2

In Part 1 we built a FastAPI app, Dockerized it, and locked it down with API keys. That is the “maximum control” path. It works great, but it requires a lot of boilerplate. Part 2 covers two things: automating the whole thing with CI/CD, and a completely different approach that makes most of that boilerplate disappear.

Feb 20, 2026
software-engineering

Data Engineering With GCP Chapter 9: Managing Users and Projects in Google Cloud

Chapter 9 is the one where Adi Wijaya zooms out from data pipelines and asks: okay, but who can access what, and how do we keep this whole thing organized? If the previous chapters taught you how to build things in GCP, this one teaches you how to not let those things turn into a security and management mess.

Feb 20, 2026
software-engineering

Data Pipelines: Batch vs Streaming and When to Use Each

This is Part 1 of Chapter 7. Part 2 covers orchestration and transformations.

Chapter 7 of Data Engineering for Beginners is probably where things start feeling real. You stop talking about storage and tables and start talking about how data actually moves. And the answer is: through pipelines.

Feb 20, 2026
software-engineering

Data Science Foundations Chapter 10 Part 1: Picking the Right Model for Your Data

You have data. You have a question. But which model do you actually use?

Chapter 10 of “Data Science Foundations” by Stephen Mariadas and Ian Huke is the biggest chapter in the book. So big I split this retelling into two parts. This is Part 1. It covers the types of analytics, understanding your data and hypothesis, and how to pick the right model.

Feb 20, 2026
software-engineering

NiFi Registry Version Control - Study Notes From Data Engineering With Python Ch 8

You’ve been building data pipelines for several chapters now. They work. They move data. But here’s the problem: none of them have version control. If you break something, there’s no going back. Chapter 8 of Data Engineering with Python by Paul Crickard fixes that. It introduces the NiFi Registry, a sub-project of Apache NiFi that handles version control for your data pipelines.

Feb 20, 2026
software-engineering

R for Python Developers - Getting Started With RStudio and Tibbles

Chapter 2 is where the book gets hands-on. Rick Scavetta takes the wheel and walks Python developers through R. Not from scratch, but with the assumption you already know how to code. The chapter is big, so I split it into two posts. This is the first half.

Feb 19, 2026
software-engineering

Data Engineering With GCP Chapter 8 Part 2: Vertex AI, AutoML, and ML Pipelines

Part 1 covered the ML basics: what supervised and unsupervised learning are, how a simple model gets trained, and why data engineers should care about ML at all. Now in Part 2, Adi Wijaya moves into the GCP tools that make ML actually work in production. This is where theory meets infrastructure.

Feb 19, 2026
software-engineering

Data Science Foundations Chapter 9: Averages, Probability, and the Math You Actually Need

Math scares people. I get it. You hear “standard deviation” and suddenly you are back in high school staring at the board. But here’s the thing. Chapter 9 of Data Science Foundations by Stephen Mariadas and Ian Huke covers the math basics you actually need for data science. And none of it is that hard.

Feb 19, 2026
software-engineering

Data Warehouses, Data Lakes, and Lakehouses - Data Engineering for Beginners (Ch.6)

Chapter 6 is where the book zooms out from “how to design one database” to “where does all this data actually live in a real company.” The answer: it depends on what you are trying to do with it.

Feb 19, 2026
software-engineering

FastAPI, Docker, and Securing Your NLP Endpoints - Chapter 7 Part 1

Chapter 7 of Laura Funderburk’s book is where the rubber meets the road. You built a RAG pipeline in Chapter 6. Now you need to ship it. Get it out of a notebook and into something that real users can hit with HTTP requests.

Feb 19, 2026
software-engineering

Production Pipeline Features - Study Notes From Data Engineering With Python Ch 7

You built a pipeline. It works on your machine. It runs on a schedule. Data goes in, data comes out. Ship it, right?

Feb 19, 2026
software-engineering

The Origin Stories of Python and R - Chapter 1 Retelling

Chapter 1 is titled “In the Beginning” and it’s written by Rick Scavetta. He opens with a tongue-in-cheek Dickens reference, saying it’s just the best of times for data science. But to understand where we are, we need to look at where Python and R came from. Their origin stories explain why they feel so different today.

Feb 18, 2026
software-engineering

Building a 311 Data Pipeline - Study Notes From Data Engineering With Python Ch 6

The previous chapters taught you the individual tools. Python, NiFi, Airflow, databases, data cleaning. Chapter 6 of Data Engineering with Python by Paul Crickard puts them all together into one real project.

Feb 18, 2026
software-engineering

Data Engineering With GCP Chapter 8 Part 1: Machine Learning Basics for Data Engineers

Chapter 8 is the one where Adi Wijaya finally brings up the topic every data engineer either loves or dreads: machine learning. And honestly, he does a good job of calming down both camps. If you are excited about ML, great. If you think it has nothing to do with your job, think again. This chapter shows why ML and data engineering are way closer than most people realize.

Feb 18, 2026
software-engineering

Data Science Foundations Chapter 8: Cleaning and Preparing Your Data

You know that feeling when you buy fresh ingredients for dinner, and then spend 80% of your time washing, cutting, and peeling? The actual cooking takes 20 minutes. Data science is exactly like that. The cooking is the model. The prep work is this chapter.

Feb 18, 2026
software-engineering

Measuring RAG Quality With RAGAS and Weights & Biases: Evaluation, Observability, and Cost-Performance Tradeoffs

In Part 1, we covered how Funderburk moves from Jupyter notebooks to a production-ready project structure. Docker, uv, SuperComponents, dual Elasticsearch. Now comes the part that actually tells you if your RAG pipeline is any good: systematic evaluation with RAGAS and continuous monitoring with Weights and Biases.

Feb 18, 2026
software-engineering

Normalization and Database Design - Data Engineering for Beginners (Ch.5 Part 2)

This is Part 2 of Chapter 5, continuing from data modeling basics.

If Part 1 was about drawing the blueprint, Part 2 is about keeping the building from falling apart. Normalization is one of those topics that sounds academic until you hit a real bug caused by duplicate data. Then it clicks fast.

Feb 18, 2026
software-engineering

What Modern Data Science Really Means - Python and R Book Preface

The preface of “Python and R for the Modern Data Scientist” sets up the whole book in a few pages. And it does something rare for a tech book. It actually defines what it means by its own title.

Feb 17, 2026
software-engineering

Book Retelling: Python and R for the Modern Data Scientist

I picked up “Python and R for the Modern Data Scientist” by Rick J. Scavetta and Boyan Angelov a while back. It’s an O’Reilly book from 2021, and it caught my eye because it doesn’t pick sides in the Python vs R debate. Instead, it argues you should use both.

Feb 17, 2026
software-engineering

Cleaning and Transforming Data - Study Notes From Data Engineering With Python Ch 5

You can build the best pipeline in the world. You can read files, write to databases, schedule everything with Airflow. But if the data going through that pipeline is messy, none of it matters.

Feb 17, 2026
software-engineering

Data Engineering With AWS Chapter 6 Part 2: Ingesting Streaming Data

This is post 10 in my Data Engineering with AWS retelling series.

Part 1 covered batch ingestion – pulling data from databases into S3 on a schedule. But not all data waits politely for a nightly load. IoT sensors, vehicle telemetry, live gameplay events, social media mentions – this data streams in continuously and often needs to be processed in near-real-time.

Feb 17, 2026
software-engineering

Data Engineering With GCP Chapter 7: Making Data Visual With Looker Studio

You spend weeks building pipelines, modeling data, setting up orchestration. Everything works. Data lands in BigQuery clean and on time. And then someone from the business side asks: “So… where do I see the numbers?” That is exactly where Chapter 7 picks up. All that upstream work has to end somewhere useful, and for most organizations that somewhere is a dashboard.

Feb 17, 2026
software-engineering

Data Modeling and ER Diagrams - Data Engineering for Beginners (Ch.5 Part 1)

This is Part 1 of Chapter 5. Part 2 covers normalization and design best practices.

Chapter 5 of Data Engineering for Beginners by Chisom Nwokwu is about database design. And honestly, this is where things start to feel real. The previous chapters gave us SQL and database basics. Now we are drawing blueprints.

Feb 17, 2026
software-engineering

Data Science Foundations Chapter 7: Where to Find and How to Source Your Data

You have a great hypothesis. Your stakeholders are on board. But none of it matters without the right data.

Chapter 7 of “Data Science Foundations” by Stephen Mariadas and Ian Huke is about sourcing. Where do you get data? How do you collect it? How do you know if it is any good?

Feb 17, 2026
software-engineering

From Jupyter Notebooks to Production RAG: Docker, Uv, SuperComponents, and Why Project Structure Matters

Chapter 6 is where the book shifts gears. Hard. Funderburk basically says: “Cool, you built a RAG pipeline. It works on your laptop. Now what?”

Feb 16, 2026
software-engineering

Data Engineering With GCP Chapter 6 Part 2: Stream Processing With Dataflow

In Part 1 we covered Pub/Sub and how messages flow between publishers and subscribers. Now comes the fun part: what do you actually do with those messages once you have them? That’s where Apache Beam and Dataflow come in.

Feb 16, 2026
software-engineering

Data Science Foundations Chapter 6: Understanding Data Properties and Types

Chapter 6 of Data Science Foundations by Stephen Mariadas and Ian Huke is about something that sounds boring but really is not. Properties of data. What kind of data are you working with? And why does it matter so much?

Feb 16, 2026
software-engineering

Knowledge Graphs, Synthetic Test Data, and Multi-Source Pipelines in Haystack

In the last post we learned the rules for building custom Haystack components. Now Funderburk puts those rules to work on a real problem: building a pipeline that creates a knowledge graph from your documents and then generates synthetic test questions from that graph.

Feb 16, 2026
software-engineering

SQL Advanced Queries: JOINs, Subqueries, and Window Functions

This is Part 2 of Chapter 4, continuing from the SQL basics.

In Part 1 we covered how to pull data from one table. Filter it, sort it, count it. But real databases have many tables. Customers in one, orders in another, products in a third. The interesting stuff happens when you combine them.

Feb 16, 2026
software-engineering

Working With Databases - Study Notes From Data Engineering With Python Ch 4

Most data pipelines start with a database. Most of them end with one too. Chapter 4 of Paul Crickard’s book is about connecting Python to databases and moving data between them. If the previous chapter was about flat files, this one is where things get real.

Feb 15, 2026
software-engineering

Custom Haystack Components: The @Component Decorator, Input/Output Contracts, and Warm_up

Chapter 5 is where Funderburk says: stop being a user of Haystack. Start being an architect. Up until now, the book has been about plugging together existing components. Now you learn to build your own.

Feb 15, 2026
software-engineering

Data Engineering With GCP Chapter 6 Part 1: Real-Time Data With Pub/Sub

Chapter 6 is where Adi Wijaya switches gears from batch to real-time. After spending Chapters 3 through 5 on batch pipelines with BigQuery, Cloud Composer, and Dataproc, now it is time to talk about streaming data. Two GCP services carry this chapter: Pub/Sub and Dataflow. This post covers the streaming concepts and Pub/Sub. Dataflow gets its own post in Part 2.

Feb 15, 2026
software-engineering

Data Science Foundations Chapter 5: The Discovery Phase and Asking the Right Questions

You got a data science project. Great. But before you touch any data, before you write a single line of code, you need to stop and think. That is what Chapter 5 of “Data Science Foundations” by Stephen Mariadas and Ian Huke is about. The discovery phase. The part most people want to skip. And it is the part that saves you from wasting months on something that never had a chance.

Feb 15, 2026
software-engineering

Reading and Writing Files in Python - Study Notes From Data Engineering With Python Ch 3

Chapter 3 is where Crickard moves from setup to actual work. You installed all those tools in Chapter 2. Now you use them. The chapter covers one of the most fundamental tasks in data engineering: getting data out of text files and into something useful.

Feb 15, 2026
software-engineering

SQL Basics: SELECT, WHERE, and Aggregate Functions

This is Part 1 of Chapter 4. Part 2 covers joins and advanced queries.

Chapter 4 is where Nwokwu puts SQL in your hands. No more theory. You write queries, you get results, you learn by doing. If Chapter 3 was about understanding what databases are, this chapter is about talking to them.

Feb 14, 2026
software-engineering

Building Your Data Engineering Setup - Study Notes From Data Engineering With Python Ch 2

Chapter 1 was all theory. Now it’s time to actually install stuff. Chapter 2 of Data Engineering with Python by Paul Crickard is a setup chapter. You install the tools, configure them, and make sure everything talks to each other.

Feb 14, 2026
software-engineering

Data Engineering With GCP Chapter 5 Part 2: Working With Spark on Dataproc

In Part 1 we set up a Dataproc cluster, got familiar with HDFS, and touched on what a data lake actually is. Now it is time to get into the real work: writing PySpark code, understanding RDDs, moving data between HDFS, GCS, and BigQuery, and learning how to actually submit Spark jobs to Dataproc.

Feb 14, 2026
software-engineering

Data Science Foundations Chapter 4: Ethics, Laws, and Doing the Right Thing With Data

Imagine your company asks you to build a model that predicts health outcomes for people. Sounds great, right? Better treatments, healthier population, maybe even lower costs. But what if your health data gets shared? What if your insurance premiums go up because of something the model found? What if you get denied a service?

Feb 14, 2026
software-engineering

Database Fundamentals: SQL, NoSQL, and ACID

Chapter 3 is where things get real. You stop talking about data in the abstract and start working with the thing that actually holds it: databases. If you plan to do any data engineering at all, this is where your daily life begins.

Feb 14, 2026
software-engineering

Hybrid RAG: Parallel Retrieval, Fusion, Reranking, and Multimodal Pipelines

In the last post we built a naive RAG pipeline. It works, but it has a blind spot: it only understands meaning, not exact words. Search for error code “ERR-4052” and the semantic retriever might miss the one document that contains that exact string. This is the vocabulary mismatch problem, and hybrid RAG is how you fix it.

Feb 13, 2026
software-engineering

Data Engineering With GCP Chapter 5 Part 1: Building a Data Lake on Google Cloud

Chapter 5 is where things get interesting if you come from a traditional database background. We are leaving the nice structured world of BigQuery and entering the territory of raw files, distributed storage, and the Hadoop ecosystem. Welcome to the data lake.

Feb 13, 2026
software-engineering

Data Science Foundations Chapter 3: How to Actually Deliver a Data Science Project

Chapter 2 was about stakeholders. Now Chapter 3 asks a very practical question: how do you actually get a data science project done?

Feb 13, 2026
software-engineering

Haystack Pipelines: Indexing, Multimodal Processing, and Your First RAG System

Chapter 4 is where you stop reading about components and actually start wiring them together. Laura Funderburk calls it “Bringing Components Together,” and that’s exactly what it is. You take all those building blocks from Chapter 3 and connect them into working pipelines.

Feb 13, 2026
software-engineering

Introduction to Data Engineering - The Oil Refinery, the Lifecycle, and the People

Chapter 1 was about understanding data itself. Chapter 2 answers the bigger question: what do data engineers actually do with it?

Feb 13, 2026
software-engineering

What Is Data Engineering? Study Notes From Data Engineering With Python Ch 1

Chapter 1 of Data Engineering with Python by Paul Crickard starts with the basics. What is data engineering? What do data engineers actually do? And how is it different from data science?

Feb 12, 2026
software-engineering

Data Engineering With GCP Chapter 4 Part 2: Airflow Scheduling, Idempotency, and Sensors

In the first part we got Cloud Composer running, wrote our first DAGs, and learned operators. This second part covers the stuff that separates beginner Airflow code from production-ready pipelines: variables, idempotent tasks, backfilling, sensors, and dataset-driven scheduling.

Feb 12, 2026
software-engineering

Data Engineering With Python: My Study Notes From Paul Crickard's Book

So I picked up Data Engineering with Python by Paul Crickard (Packt, 2020, ISBN: 978-1-83921-418-9) and decided to write up my study notes as I go through it. I’ve been working in IT for over 20 years, and data engineering keeps coming up everywhere. This book seemed like a good one to work through and share what I learn.

Feb 12, 2026
software-engineering

Data Science Foundations Chapter 2: Who Are Your Stakeholders and Why They Matter

You built a model. It works. The numbers look great. But nobody uses it.

Because you forgot the people around it. Chapter 2 of “Data Science Foundations” by Stephen Mariadas and Ian Huke is about exactly that. Stakeholders. The humans who care about your project, fund it, get affected by it, or can shut it down.

Feb 12, 2026
software-engineering

Haystack 2.0: RAG as a Tool, Multi-Tool Agents, and the Full Component Catalog

In the first part we covered Haystack 2.0’s core ideas: components, pipelines, SuperComponents, and how hybrid retrieval works. Now let’s look at what happens when you hand these pipelines to an AI agent, plus the full catalog of component types Haystack gives you out of the box.

Feb 12, 2026
software-engineering

Understanding Data - Types, History, and Why It Matters

The book opens with a simple claim: data is the new oil. You’ve probably heard that phrase a hundred times. But Nwokwu doesn’t just drop the cliche and move on. She actually walks you through why that comparison holds up, starting from thousands of years ago.

Feb 11, 2026
software-engineering

Data Engineering for Beginners by Chisom Nwokwu - Book Retelling Series

I picked up “Data Engineering for Beginners” by Chisom Nwokwu (Wiley, 2026, ISBN: 9781394325412) a few weeks ago. I was looking for something that explains data engineering from scratch, without assuming you already know half the field. This book does exactly that.

Feb 11, 2026
software-engineering

Data Engineering With GCP Chapter 4 Part 1: Automating Data Workflows With Cloud Composer

Up until now in the book, we built BigQuery tables by hand, wrote queries in the console, and loaded data manually. That works for learning, but nobody does that in production. In production, you need things to run on their own, on schedule, without you babysitting them at 5 AM.

Feb 11, 2026
software-engineering

Data Science Foundations Chapter 1: What Is Data Science Really About?

You probably watched Moneyball. Brad Pitt, baseball, small team beats the rich guys using numbers. Good movie. But what the movie really shows is something bigger. It shows what happens when you take data seriously. And that is basically what data science is about.

Feb 11, 2026
software-engineering

Haystack 2.0 by Deepset: Components, Pipelines, Document Stores, and Retrievers

Chapter 3 of Laura Funderburk’s book is where the rubber meets the road. We stop talking theory and start looking at an actual framework you can use to build real NLP pipelines. That framework is Haystack 2.0 by a company called deepset.

Feb 10, 2026
software-engineering

Data Engineering With AWS Chapter 6 Part 1: Ingesting Batch Data

This is post 9 in my Data Engineering with AWS retelling series.

You have your whiteboard architecture from Chapter 5. You know who your data consumers are and what they need. Now it is time to actually move data. Chapter 6 covers data ingestion – getting data from wherever it lives into your AWS data lake. This first part focuses on batch ingestion from databases and files. Part 2 covers streaming.

Feb 10, 2026
software-engineering

Data Engineering With GCP Chapter 3 Part 2: Data Modeling and BigQuery Features

In Part 1 we loaded CSV files into BigQuery and built a simple warehouse from a MySQL export. Now the book throws a second scenario at us: bike-sharing data. More tables, daily batch loading, and a real question that every data engineer has to face sooner or later. How do you actually model your data so that business people can use it without calling you every five minutes?

Feb 10, 2026
software-engineering

Starting a Book Retelling: Data Science Foundations by Mariadas and Huke

I have been working in tech for over 20 years. Seen a lot of trends come and go. But data science is not a trend. It is here to stay. And honestly, I wanted a book that explains the whole thing from the ground up without assuming I already know everything.

Feb 10, 2026
software-engineering

Vector Stores, Agentic Memory, and the Economics of LLMs - Chapter 2 Part 3

Parts 1 and 2 of this chapter covered transformer architecture, the SLM/RLM split, context engineering strategies, and the Haystack + LangGraph hybrid architecture. Now Funderburk closes the chapter with two topics that every developer building LLM applications needs to understand: vector stores and the economics of inference.

Feb 09, 2026
software-engineering

Context Engineering, Prompt Strategies, and Framework Wars - Chapter 2 Part 2

In Part 1, we covered how transformers work and how models split into small language models (SLMs) and reasoning language models (RLMs). Now Funderburk shifts to a big question: how do you actually interact with these models in a reliable way?

Feb 09, 2026
software-engineering

Data Engineering With GCP Chapter 3 Part 1: Your First BigQuery Data Warehouse

Chapter 3 is where things get real. Up to now the book was setting the stage, explaining what data engineering is, showing you around GCP. Now Adi Wijaya says: okay, let’s actually build something. And the something is a data warehouse in BigQuery.

Feb 08, 2026
software-engineering

Data Engineering With GCP Chapter 2: Getting Started With Google Cloud for Big Data

Chapter 2 is where Adi Wijaya starts showing what Google Cloud Platform actually has for data engineers. After the theory in Chapter 1, this one is about opening GCP for the first time and figuring out which services matter and which ones you can safely ignore for now.

Feb 08, 2026
software-engineering

Transformers, Attention, and the Evolution of LLMs - Chapter 2 Part 1

Chapter 2 of Laura Funderburk’s book opens with the big picture of large language models. Where they came from, how they work inside, and where they are heading. If Chapter 1 was about pipelines, this chapter is about the models that sit at the center of those pipelines.

Feb 07, 2026
software-engineering

Data Engineering With GCP Chapter 1: What Is Data Engineering Anyway?

Chapter 1 starts with a confession most of us in the data world can relate to. Adi Wijaya says he used to think data was clean. Neatly organized, ready to go. Then he actually worked with data in real organizations and realized most of the effort goes into collecting, cleaning, and transforming it. Not the fun machine learning part. The plumbing part.

Feb 07, 2026
software-engineering

NLP Pipeline Fundamentals Part 2: Tokenization, Embeddings, LLM Roles, and the Road to Agentic Pipelines

In Part 1 we covered the agentic reliability crisis, what data pipelines are, and why classic NLP techniques are being reborn as tools for AI agents. Now let’s get into the specifics: how tokenization and embeddings actually work, what LLMs are, and the two very different roles they play in modern agentic systems.

Feb 06, 2026
software-engineering

Data Engineering With Google Cloud Platform: A Book Retelling Series

I just finished reading “Data Engineering with Google Cloud Platform” by Adi Wijaya (2nd edition, Packt Publishing, 2024) and I want to share what I learned. Not as a dry summary, but more like telling a friend what the book is about over coffee.

Feb 06, 2026
software-engineering

NLP Pipeline Fundamentals: Data Pipelines, the Agentic Reliability Crisis, and Why Classic NLP Still Matters

Chapter 1 of Laura Funderburk’s book opens with something I wish more people in the AI space would say out loud: the era of pure experimentation with LLMs is over. We’re past the “look what ChatGPT can do” stage. The real question now is: can you trust this thing in production?

Feb 05, 2026
software-engineering

Building NLP and LLM Pipelines With Haystack - Book Retelling Series

So I just finished reading Building Natural Language and LLM Pipelines by Laura Funderburk, and I wanted to share what I learned. This is one of those books that bridges the gap between “I can make a ChatGPT wrapper” and “I can build production AI systems that actually work.”

Feb 03, 2026
software-engineering

Data Engineering With AWS Chapter 5: Architecting Data Engineering Pipelines

This is post 8 in my Data Engineering with AWS retelling series.

You have learned about data engineering principles, data architectures, the AWS toolkit, and data governance. Now comes the part where it all comes together. Chapter 5 is about designing an actual data pipeline. Not writing code yet. Just thinking. Planning. Drawing on a whiteboard.

Jan 27, 2026
software-engineering

Data Engineering With AWS Chapter 4 Part 2: Data Governance in Practice

In Part 1, we covered the theory: what data security and governance mean, how catalogs prevent your lake from becoming a swamp, and the core AWS services for encryption and identity. Now it is time to put it into practice.

Jan 20, 2026
software-engineering

Data Engineering With AWS Chapter 4 Part 1: Data Cataloging and Security

You can have the fastest data pipeline on the planet. You can have the slickest dashboards, the fanciest machine learning models, the most optimized Parquet files. None of it matters if your data gets stolen, mishandled, or dumped into a lake that nobody can navigate.

Jan 13, 2026
software-engineering

Data Engineering With AWS Chapter 3 Part 2: The AWS Toolkit - Analytics and Processing

In Part 1 we covered how data gets into AWS. Now comes the good part: what do you actually do with it once it is there? This post covers the services for transforming raw data, orchestrating multi-step pipelines, and letting people query and visualize the results.

Jan 06, 2026
software-engineering

Data Engineering With AWS Chapter 3 Part 1: The AWS Toolkit - Storage and Databases

Chapter 3 is massive. It is basically a catalog of every AWS service a data engineer will touch, from getting data in to getting answers out. So I am splitting it into two posts. This first part covers how data gets into AWS – all the ingestion services, the streaming tools, and the physical devices AWS will literally ship to your door.

Dec 30, 2025
software-engineering

Data Engineering With AWS Chapter 2: Data Management Architectures for Analytics

Chapter 1 gave us the “who” and “why” of data engineering. Now it is time for the “where.” Where does all that data actually live? How do organizations store, organize, and serve billions of rows of information so that someone on the business side can pull up a dashboard and make a decision before lunch?

Dec 23, 2025
software-engineering

Data Engineering With AWS Chapter 1: What Even Is Data Engineering?

If someone told you twenty years ago that data would become more valuable than oil, you would have laughed. But here we are. The most valuable companies on the planet are not drilling for crude. They are collecting, processing, and squeezing insights out of massive piles of data. And behind every one of those companies, there is a team of data engineers making it all work.

Dec 16, 2025
software-engineering

Data Engineering With AWS: A Book Retelling Series for the Cloud-Curious

Every company today is drowning in data. Clicks, transactions, sensor readings, log files, social media posts. It just keeps coming. But raw data sitting in a pile is useless. The real magic happens when someone builds the pipes that move it, clean it, reshape it, and deliver it to the people who need it.

Book retelling

What This Book Got Right

About

About BookGrill.net

Category

Tags View all tags