Data engineering

Apr 21, 2026
software-engineering

Data Engineering With AWS Chapter 13: Enabling AI and Machine Learning

This is post 19 in my Data Engineering with AWS retelling series.

Throughout this book, we have been ingesting data, transforming data, storing data, querying data, and visualizing data. All of that is incredibly useful on its own. But there is a whole other level where data gets really powerful: when you use it to teach a machine to make predictions.

Apr 14, 2026
software-engineering

Data Engineering With AWS Chapter 12: Visualizing Data With Amazon QuickSight

This is post 18 in my Data Engineering with AWS retelling series.

We have spent eleven chapters ingesting data, transforming data, cataloging data, querying data. But here is a simple truth: nobody wants to stare at 10,000 rows in a spreadsheet. Our brains are not built for that. We process pictures way faster than text. A well-designed chart can tell you in two seconds what would take twenty minutes to figure out from raw numbers.

Apr 07, 2026
software-engineering

Data Engineering With AWS Chapter 11: Ad Hoc Queries With Amazon Athena

This is post 17 in my Data Engineering with AWS retelling series.

You have a data lake. Terabytes of files sitting in S3 across landing zones, clean zones, and transform zones. The data is there. But how do you actually ask it questions? You could spin up a database, load everything into it, and then query. But that defeats the purpose of having a data lake in the first place.

Mar 31, 2026
software-engineering

Data Engineering With AWS Chapter 10: Orchestrating the Data Pipeline

This is post 16 in my Data Engineering with AWS retelling series.

Up to this point in the book, we have been doing everything by hand. Click this button in the console. Run that Glue job manually. Trigger a crawler. Upload a file. It works fine for learning. But imagine doing that in production, every day, at 3 AM, across dozens of data sources. No thanks.

Mar 30, 2026
Big Data

Wrapping Up: Big Data on Kubernetes

We have reached the end of our deep dive into Big Data on Kubernetes by Neylson Crepalde. It has been a massive journey, moving from basic Docker containers to complex, real-time AI pipelines.

Mar 24, 2026
software-engineering

Data Engineering With AWS Chapter 9 Part 2: Bridging Data Lake and Data Warehouse

This is post 15 in my Data Engineering with AWS retelling series.

In Part 1, we looked at Redshift internals – clusters, slices, distribution styles, sort keys. All the pieces that make a data warehouse fast. But a warehouse sitting in isolation is not very useful. Data needs to flow in from your data lake, and sometimes it needs to flow back out. Part 2 of Chapter 9 covers that bridge between S3 and Redshift, including Redshift Spectrum, the COPY and UNLOAD commands, and a hands-on exercise that ties it all together.

Mar 17, 2026
software-engineering

Data Engineering With AWS Chapter 9 Part 1: Loading Data Into a Data Mart With Redshift

This is post 14 in my Data Engineering with AWS retelling series.

Your data lake is humming along. Data lands in S3, gets cleaned, transformed, cataloged. Athena lets you run SQL queries on it. So why would you need anything else?

Mar 10, 2026
software-engineering

Data Engineering With AWS Chapter 8: Who Actually Uses All This Data?

This is post 13 in my Data Engineering with AWS retelling series.

We have spent the last several chapters ingesting data, transforming data, optimizing data. Pipelines everywhere. But here is the question nobody asks often enough: who is actually going to use all of this?

Mar 07, 2026
Big Data

Building Your Own Data Images

In my last post, we talked about why containers are the bedrock of modern data engineering. But honestly, just running other people’s images only gets you so far. The real magic happens when you start packaging your own custom code.

Mar 06, 2026
Big Data

Why Containers Are a Must for Data Engineers

If you are working with data today, you can’t really ignore containers. They have become the standardized unit for how we develop, ship, and deploy software. But why do we care so much about them in the big data world?

Mar 05, 2026
Big Data

Rethinking Data Infrastructure: Big Data on Kubernetes

We are living in a world where data is basically everywhere. From your phone to social media and every single online purchase, the amount of info we generate is staggering. But here’s the thing: just having data isn’t enough. You have to be able to process it, and that’s where things get complicated.

Mar 03, 2026
software-engineering

Data Engineering With AWS Chapter 7 Part 2: Transforming Data - Optimization and Business Logic

This is post 12 in my Data Engineering with AWS retelling series.

In Part 1, we covered the generic data preparation transforms: converting to Parquet, partitioning, PII protection, and data cleansing. Those transforms work on individual datasets and do not need much business context. Now we get to the transforms that actually create business value. The ones that combine multiple datasets, add context, flatten structures, and produce the tables that analysts and dashboards consume.

Feb 28, 2026
software-engineering

Data Engineering for Beginners - Closing Thoughts on the Full Series

And that’s it. Eighteen posts. Thirteen chapters. One complete walkthrough of “Data Engineering for Beginners” by Chisom Nwokwu.

When I started this series, I said I wanted to retell the book in my own words. Not a summary, not a copy. My take on what each chapter covers and why it matters. Now that I’m at the end, let me step back and share my overall impressions.

Feb 28, 2026
software-engineering

Data Engineering With GCP: Final Thoughts and Key Takeaways

Twenty-two posts later, we are done. This was my retelling of “Data Engineering with Google Cloud Platform” by Adi Wijaya (2nd edition, Packt Publishing, 2024, ISBN 978-1-83508-011-5). Time to look back, share what stuck with me, and give an honest assessment.

Feb 28, 2026
software-engineering

Data Engineering With Python: Final Thoughts and Takeaways

That’s it. Fifteen chapters, seventeen posts, and one complete walkthrough of Paul Crickard’s Data Engineering with Python (Packt, 2020, ISBN: 978-1-83921-418-9).

Feb 27, 2026
software-engineering

Building a Career in Data Engineering - Roles, Resumes, and Interviews

This is the last technical chapter of the book. Everything before this was about skills, tools, and concepts. Chapter 13 is about what you do with all of that knowledge. How you actually get a job in data engineering.

Feb 27, 2026
software-engineering

Data Engineering With GCP Chapter 13 Part 2: GCP Certifications and Career Next Steps

In Part 1 we went through the quiz questions, extra GCP services, and how the book ties everything together. Now let’s talk about the stuff that matters after you close the book: getting certified, planning your career, and figuring out what comes next.

Feb 27, 2026
software-engineering

Real-Time Edge Data With MiNiFi and Spark - Study Notes From Data Engineering With Python Ch 15

You have NiFi running. Kafka is streaming. Spark is processing. But what about the data source? What happens when your data comes from a tiny sensor or a Raspberry Pi that can barely run a web browser?

Feb 26, 2026
software-engineering

Cloud Data Engineering - Storage, Compute, Networking, and Cost on the Cloud

Chapter 12 is the one where everything moves to the cloud. If you’ve been following along, we’ve been talking about databases, pipelines, data quality, security, governance, and big data. All of that can run on your own hardware. But most teams today don’t do that. They use cloud providers. This chapter explains why, and more importantly, how.

Feb 26, 2026
software-engineering

Data Engineering With GCP Chapter 13 Part 1: Growing Your Confidence as a Data Engineer

Chapter 13 is the last chapter in the book, and it’s different from everything that came before. No new GCP services, no hands-on exercises, no Terraform scripts. Instead, Adi Wijaya steps back and talks about the bigger picture: certifications, where data engineering is heading, and how to actually feel confident in this role.

Feb 26, 2026
software-engineering

Data Processing With Apache Spark - Study Notes From Data Engineering With Python Ch 14

You have streaming data. You have batch data. You have a lot of it. Now you need to actually process it. Fast. On more than one machine.

Feb 25, 2026
software-engineering

Big Data and Distributed Systems - Chapter 11 Retelling

At some point, your data gets too big for one machine. That’s not a hypothetical. Netflix, Google, Amazon, they all hit that wall years ago. The question is: what do you do when a single server can’t keep up?

Feb 25, 2026
software-engineering

Data Engineering With GCP Chapter 12 Part 2: Building CI/CD Pipelines on Google Cloud

In Part 1 we covered the theory behind CI/CD and ran through a basic Cloud Build exercise with a Python project. Unit tests ran automatically on every push. Broken code got caught before it reached production. Good stuff, but that was a simple calculator script. Now we need to connect this to real data engineering work.

Feb 25, 2026
software-engineering

Streaming Data With Apache Kafka - Study Notes From Data Engineering With Python Ch 13

Up to this point in the book, data pipelines have been about moving data that already exists. Query a database, read a file, process it, store it. The data sits still and you go get it.

Feb 24, 2026
software-engineering

Building a Kafka Cluster - Study Notes From Data Engineering With Python Ch 12

Up to this point in the book, everything has been batch processing. You query a database, get a full dataset, transform it, load it somewhere. The data sits still while you work on it.

Feb 24, 2026
software-engineering

Data Engineering With AWS Chapter 7 Part 1: Transforming Data - The Basics

This is post 11 in my Data Engineering with AWS retelling series.

You have data sitting in your data lake. Raw CSV files, JSON dumps, database extracts. It is all there, technically available, but trying to run analytics on it is painfully slow and expensive. This chapter is about fixing that. Transforming raw data into something optimized, clean, and ready for actual use.

Feb 24, 2026
software-engineering

Data Engineering With GCP Chapter 12 Part 1: CI/CD Basics for Data Engineers

Chapter 12 is a shift from everything we have done so far. Until now, we were learning how to build things: pipelines, data lakes, warehouses, streaming systems. Now the question is: how do you ship all that stuff to production without breaking things? The answer is CI/CD.

Feb 24, 2026
software-engineering

Data Governance Explained - Chapter 10 Retelling

Data governance sounds like something a committee of suits invented to make your life harder. But here’s the thing: without it, everything falls apart quietly.

Feb 23, 2026
software-engineering

Building a Production Data Pipeline - Study Notes From Data Engineering With Python Ch 11

You learned the individual tools. You learned the deployment strategies. Now Chapter 11 of Data Engineering with Python by Paul Crickard puts it all together. This is the chapter where you build a complete, production-grade data pipeline from start to finish.

Feb 23, 2026
software-engineering

Data Engineering With GCP Chapter 11: Keeping Google Cloud Costs Under Control

Nobody ever got promoted for building the cheapest data pipeline. But plenty of people have gotten uncomfortable phone calls from their CFO after a runaway BigQuery bill. Chapter 11 is about the money side of GCP, and I think this is one of the most practical chapters in the book.

Feb 23, 2026
software-engineering

Data Security for Data Engineers - Chapter 9 Retelling

In 2016, hackers stole personal data of 57 million Uber users and drivers. How? Someone left API credentials in a private GitHub repo. The attackers grabbed those keys, got into AWS, and downloaded everything. Uber didn’t even notice for a year. When they finally found out, they paid the hackers $100,000 to delete the data and kept quiet about it.

Feb 22, 2026
software-engineering

Data Engineering With GCP Chapter 10 Part 2: Data Quality, Security, and Compliance

In Part 1 we covered how data governance breaks into three pillars (usability, security, accountability) and went through metadata, Dataplex search, access control in BigQuery, and the Sensitive Data Protection service for finding PII. Now let’s pick up where we left off: understanding what SDP actually finds, and then moving into the accountability pillar.

Feb 22, 2026
software-engineering

Data Quality: What Bad Data Looks Like and How to Catch It

Chapter 8 of Data Engineering for Beginners opens with a statement that should be obvious but apparently is not: even the best pipelines and storage systems are meaningless if the data they deliver is garbage.

Feb 22, 2026
software-engineering

Deploying Data Pipelines - Study Notes From Data Engineering With Python Ch 10

You built your data pipelines. They work on your laptop. Now what? Chapter 10 of Data Engineering with Python by Paul Crickard covers the part everyone eventually has to face: getting your pipelines out of development and into production.

Feb 21, 2026
software-engineering

Data Engineering With GCP Chapter 10 Part 1: Data Governance Basics on Google Cloud

Data governance is one of those topics that sounds boring until you realize nobody can find anything in your data platform. Then it becomes very interesting very fast.

Feb 21, 2026
software-engineering

Monitoring Data Pipelines - Study Notes From Data Engineering With Python Ch 9

You built a data pipeline. It is idempotent, uses atomic transactions, and has version control. It is production ready. But can you tell when it breaks?

Feb 21, 2026
software-engineering

Pipeline Orchestration With Airflow, DAGs, and Data Transformations

This is Part 2 of Chapter 7, continuing from batch and streaming basics.

In Part 1, we covered how batch and streaming pipelines move data around. But here is the thing: having a pipeline is one thing. Making sure all its parts run in the right order, at the right time, without you babysitting it? That is orchestration. And this is where Chapter 7 gets really practical.

Feb 20, 2026
software-engineering

Data Engineering With GCP Chapter 9: Managing Users and Projects in Google Cloud

Chapter 9 is the one where Adi Wijaya zooms out from data pipelines and asks: okay, but who can access what, and how do we keep this whole thing organized? If the previous chapters taught you how to build things in GCP, this one teaches you how to not let those things turn into a security and management mess.

Feb 20, 2026
software-engineering

Data Pipelines: Batch vs Streaming and When to Use Each

This is Part 1 of Chapter 7. Part 2 covers orchestration and transformations.

Chapter 7 of Data Engineering for Beginners is probably where things start feeling real. You stop talking about storage and tables and start talking about how data actually moves. And the answer is: through pipelines.

Feb 20, 2026
software-engineering

NiFi Registry Version Control - Study Notes From Data Engineering With Python Ch 8

You’ve been building data pipelines for several chapters now. They work. They move data. But here’s the problem: none of them have version control. If you break something, there’s no going back. Chapter 8 of Data Engineering with Python by Paul Crickard fixes that. It introduces the NiFi Registry, a sub-project of Apache NiFi that handles version control for your data pipelines.

Feb 19, 2026
software-engineering

Data Engineering With GCP Chapter 8 Part 2: Vertex AI, AutoML, and ML Pipelines

Part 1 covered the ML basics: what supervised and unsupervised learning are, how a simple model gets trained, and why data engineers should care about ML at all. Now in Part 2, Adi Wijaya moves into the GCP tools that make ML actually work in production. This is where theory meets infrastructure.

Feb 19, 2026
software-engineering

Data Warehouses, Data Lakes, and Lakehouses - Data Engineering for Beginners (Ch.6)

Chapter 6 is where the book zooms out from “how to design one database” to “where does all this data actually live in a real company.” The answer: it depends on what you are trying to do with it.

Feb 19, 2026
software-engineering

Production Pipeline Features - Study Notes From Data Engineering With Python Ch 7

You built a pipeline. It works on your machine. It runs on a schedule. Data goes in, data comes out. Ship it, right?

Feb 18, 2026
software-engineering

Building a 311 Data Pipeline - Study Notes From Data Engineering With Python Ch 6

The previous chapters taught you the individual tools. Python, NiFi, Airflow, databases, data cleaning. Chapter 6 of Data Engineering with Python by Paul Crickard puts them all together into one real project.

Feb 18, 2026
software-engineering

Data Engineering With GCP Chapter 8 Part 1: Machine Learning Basics for Data Engineers

Chapter 8 is the one where Adi Wijaya finally brings up the topic every data engineer either loves or dreads: machine learning. And honestly, he does a good job of calming down both camps. If you are excited about ML, great. If you think it has nothing to do with your job, think again. This chapter shows why ML and data engineering are way closer than most people realize.

Feb 18, 2026
software-engineering

Normalization and Database Design - Data Engineering for Beginners (Ch.5 Part 2)

This is Part 2 of Chapter 5, continuing from data modeling basics.

If Part 1 was about drawing the blueprint, Part 2 is about keeping the building from falling apart. Normalization is one of those topics that sounds academic until you hit a real bug caused by duplicate data. Then it clicks fast.

Feb 17, 2026
software-engineering

Cleaning and Transforming Data - Study Notes From Data Engineering With Python Ch 5

You can build the best pipeline in the world. You can read files, write to databases, schedule everything with Airflow. But if the data going through that pipeline is messy, none of it matters.

Feb 17, 2026
software-engineering

Data Engineering With AWS Chapter 6 Part 2: Ingesting Streaming Data

This is post 10 in my Data Engineering with AWS retelling series.

Part 1 covered batch ingestion – pulling data from databases into S3 on a schedule. But not all data waits politely for a nightly load. IoT sensors, vehicle telemetry, live gameplay events, social media mentions – this data streams in continuously and often needs to be processed in near-real-time.

Feb 17, 2026
software-engineering

Data Engineering With GCP Chapter 7: Making Data Visual With Looker Studio

You spend weeks building pipelines, modeling data, setting up orchestration. Everything works. Data lands in BigQuery clean and on time. And then someone from the business side asks: “So… where do I see the numbers?” That is exactly where Chapter 7 picks up. All that upstream work has to end somewhere useful, and for most organizations that somewhere is a dashboard.

Feb 17, 2026
software-engineering

Data Modeling and ER Diagrams - Data Engineering for Beginners (Ch.5 Part 1)

This is Part 1 of Chapter 5. Part 2 covers normalization and design best practices.

Chapter 5 of Data Engineering for Beginners by Chisom Nwokwu is about database design. And honestly, this is where things start to feel real. The previous chapters gave us SQL and database basics. Now we are drawing blueprints.

Feb 16, 2026
software-engineering

Data Engineering With GCP Chapter 6 Part 2: Stream Processing With Dataflow

In Part 1 we covered Pub/Sub and how messages flow between publishers and subscribers. Now comes the fun part: what do you actually do with those messages once you have them? That’s where Apache Beam and Dataflow come in.

Feb 16, 2026
software-engineering

SQL Advanced Queries: JOINs, Subqueries, and Window Functions

This is Part 2 of Chapter 4, continuing from the SQL basics.

In Part 1 we covered how to pull data from one table. Filter it, sort it, count it. But real databases have many tables. Customers in one, orders in another, products in a third. The interesting stuff happens when you combine them.

Feb 16, 2026
software-engineering

Working With Databases - Study Notes From Data Engineering With Python Ch 4

Most data pipelines start with a database. Most of them end with one too. Chapter 4 of Paul Crickard’s book is about connecting Python to databases and moving data between them. If the previous chapter was about flat files, this one is where things get real.

Feb 15, 2026
software-engineering

Data Engineering With GCP Chapter 6 Part 1: Real-Time Data With Pub/Sub

Chapter 6 is where Adi Wijaya switches gears from batch to real-time. After spending Chapters 3 through 5 on batch pipelines with BigQuery, Cloud Composer, and Dataproc, now it is time to talk about streaming data. Two GCP services carry this chapter: Pub/Sub and Dataflow. This post covers the streaming concepts and Pub/Sub. Dataflow gets its own post in Part 2.

Feb 15, 2026
software-engineering

Reading and Writing Files in Python - Study Notes From Data Engineering With Python Ch 3

Chapter 3 is where Crickard moves from setup to actual work. You installed all those tools in Chapter 2. Now you use them. The chapter covers one of the most fundamental tasks in data engineering: getting data out of text files and into something useful.

Feb 15, 2026
software-engineering

SQL Basics: SELECT, WHERE, and Aggregate Functions

This is Part 1 of Chapter 4. Part 2 covers joins and advanced queries.

Chapter 4 is where Nwokwu puts SQL in your hands. No more theory. You write queries, you get results, you learn by doing. If Chapter 3 was about understanding what databases are, this chapter is about talking to them.

Feb 14, 2026
software-engineering

Building Your Data Engineering Setup - Study Notes From Data Engineering With Python Ch 2

Chapter 1 was all theory. Now it’s time to actually install stuff. Chapter 2 of Data Engineering with Python by Paul Crickard is a setup chapter. You install the tools, configure them, and make sure everything talks to each other.

Feb 14, 2026
software-engineering

Data Engineering With GCP Chapter 5 Part 2: Working With Spark on Dataproc

In Part 1 we set up a Dataproc cluster, got familiar with HDFS, and touched on what a data lake actually is. Now it is time to get into the real work: writing PySpark code, understanding RDDs, moving data between HDFS, GCS, and BigQuery, and learning how to actually submit Spark jobs to Dataproc.

Feb 14, 2026
software-engineering

Database Fundamentals: SQL, NoSQL, and ACID

Chapter 3 is where things get real. You stop talking about data in the abstract and start working with the thing that actually holds it: databases. If you plan to do any data engineering at all, this is where your daily life begins.

Feb 13, 2026
software-engineering

Data Engineering With GCP Chapter 5 Part 1: Building a Data Lake on Google Cloud

Chapter 5 is where things get interesting if you come from a traditional database background. We are leaving the nice structured world of BigQuery and entering the territory of raw files, distributed storage, and the Hadoop ecosystem. Welcome to the data lake.

Feb 13, 2026
software-engineering

Introduction to Data Engineering - The Oil Refinery, the Lifecycle, and the People

Chapter 1 was about understanding data itself. Chapter 2 answers the bigger question: what do data engineers actually do with it?

Feb 13, 2026
software-engineering

What Is Data Engineering? Study Notes From Data Engineering With Python Ch 1

Chapter 1 of Data Engineering with Python by Paul Crickard starts with the basics. What is data engineering? What do data engineers actually do? And how is it different from data science?

Feb 12, 2026
software-engineering

Data Engineering With GCP Chapter 4 Part 2: Airflow Scheduling, Idempotency, and Sensors

In the first part we got Cloud Composer running, wrote our first DAGs, and learned operators. This second part covers the stuff that separates beginner Airflow code from production-ready pipelines: variables, idempotent tasks, backfilling, sensors, and dataset-driven scheduling.

Feb 12, 2026
software-engineering

Data Engineering With Python: My Study Notes From Paul Crickard's Book

So I picked up Data Engineering with Python by Paul Crickard (Packt, 2020, ISBN: 978-1-83921-418-9) and decided to write up my study notes as I go through it. I’ve been working in IT for over 20 years, and data engineering keeps coming up everywhere. This book seemed like a good one to work through and share what I learn.

Feb 12, 2026
software-engineering

Understanding Data - Types, History, and Why It Matters

The book opens with a simple claim: data is the new oil. You’ve probably heard that phrase a hundred times. But Nwokwu doesn’t just drop the cliche and move on. She actually walks you through why that comparison holds up, starting from thousands of years ago.

Feb 11, 2026
software-engineering

Data Engineering for Beginners by Chisom Nwokwu - Book Retelling Series

I picked up “Data Engineering for Beginners” by Chisom Nwokwu (Wiley, 2026, ISBN: 9781394325412) a few weeks ago. I was looking for something that explains data engineering from scratch, without assuming you already know half the field. This book does exactly that.

Feb 11, 2026
software-engineering

Data Engineering With GCP Chapter 4 Part 1: Automating Data Workflows With Cloud Composer

Up until now in the book, we built BigQuery tables by hand, wrote queries in the console, and loaded data manually. That works for learning, but nobody does that in production. In production, you need things to run on their own, on schedule, without you babysitting them at 5 AM.

Feb 10, 2026
software-engineering

Data Engineering With AWS Chapter 6 Part 1: Ingesting Batch Data

This is post 9 in my Data Engineering with AWS retelling series.

You have your whiteboard architecture from Chapter 5. You know who your data consumers are and what they need. Now it is time to actually move data. Chapter 6 covers data ingestion – getting data from wherever it lives into your AWS data lake. This first part focuses on batch ingestion from databases and files. Part 2 covers streaming.

Feb 10, 2026
software-engineering

Data Engineering With GCP Chapter 3 Part 2: Data Modeling and BigQuery Features

In Part 1 we loaded CSV files into BigQuery and built a simple warehouse from a MySQL export. Now the book throws a second scenario at us: bike-sharing data. More tables, daily batch loading, and a real question that every data engineer has to face sooner or later. How do you actually model your data so that business people can use it without calling you every five minutes?

Feb 09, 2026
software-engineering

Data Engineering With GCP Chapter 3 Part 1: Your First BigQuery Data Warehouse

Chapter 3 is where things get real. Up to now the book was setting the stage, explaining what data engineering is, showing you around GCP. Now Adi Wijaya says: okay, let’s actually build something. And the something is a data warehouse in BigQuery.

Feb 08, 2026
software-engineering

Data Engineering With GCP Chapter 2: Getting Started With Google Cloud for Big Data

Chapter 2 is where Adi Wijaya starts showing what Google Cloud Platform actually has for data engineers. After the theory in Chapter 1, this one is about opening GCP for the first time and figuring out which services matter and which ones you can safely ignore for now.

Feb 07, 2026
software-engineering

Data Engineering With GCP Chapter 1: What Is Data Engineering Anyway?

Chapter 1 starts with a confession most of us in the data world can relate to. Adi Wijaya says he used to think data was clean. Neatly organized, ready to go. Then he actually worked with data in real organizations and realized most of the effort goes into collecting, cleaning, and transforming it. Not the fun machine learning part. The plumbing part.

Feb 06, 2026
software-engineering

Data Engineering With Google Cloud Platform: A Book Retelling Series

I just finished reading “Data Engineering with Google Cloud Platform” by Adi Wijaya (2nd edition, Packt Publishing, 2024) and I want to share what I learned. Not as a dry summary, but more like telling a friend what the book is about over coffee.

Feb 03, 2026
software-engineering

Data Engineering With AWS Chapter 5: Architecting Data Engineering Pipelines

This is post 8 in my Data Engineering with AWS retelling series.

You have learned about data engineering principles, data architectures, the AWS toolkit, and data governance. Now comes the part where it all comes together. Chapter 5 is about designing an actual data pipeline. Not writing code yet. Just thinking. Planning. Drawing on a whiteboard.

Jan 27, 2026
software-engineering

Data Engineering With AWS Chapter 4 Part 2: Data Governance in Practice

In Part 1, we covered the theory: what data security and governance mean, how catalogs prevent your lake from becoming a swamp, and the core AWS services for encryption and identity. Now it is time to put it into practice.

Jan 20, 2026
software-engineering

Data Engineering With AWS Chapter 4 Part 1: Data Cataloging and Security

You can have the fastest data pipeline on the planet. You can have the slickest dashboards, the fanciest machine learning models, the most optimized Parquet files. None of it matters if your data gets stolen, mishandled, or dumped into a lake that nobody can navigate.

Jan 13, 2026
software-engineering

Data Engineering With AWS Chapter 3 Part 2: The AWS Toolkit - Analytics and Processing

In Part 1 we covered how data gets into AWS. Now comes the good part: what do you actually do with it once it is there? This post covers the services for transforming raw data, orchestrating multi-step pipelines, and letting people query and visualize the results.

Jan 06, 2026
software-engineering

Data Engineering With AWS Chapter 3 Part 1: The AWS Toolkit - Storage and Databases

Chapter 3 is massive. It is basically a catalog of every AWS service a data engineer will touch, from getting data in to getting answers out. So I am splitting it into two posts. This first part covers how data gets into AWS – all the ingestion services, the streaming tools, and the physical devices AWS will literally ship to your door.

Dec 30, 2025
software-engineering

Data Engineering With AWS Chapter 2: Data Management Architectures for Analytics

Chapter 1 gave us the “who” and “why” of data engineering. Now it is time for the “where.” Where does all that data actually live? How do organizations store, organize, and serve billions of rows of information so that someone on the business side can pull up a dashboard and make a decision before lunch?

Dec 23, 2025
software-engineering

Data Engineering With AWS Chapter 1: What Even Is Data Engineering?

If someone told you twenty years ago that data would become more valuable than oil, you would have laughed. But here we are. The most valuable companies on the planet are not drilling for crude. They are collecting, processing, and squeezing insights out of massive piles of data. And behind every one of those companies, there is a team of data engineers making it all work.

Dec 16, 2025
software-engineering

Data Engineering With AWS: A Book Retelling Series for the Cloud-Curious

Every company today is drowning in data. Clicks, transactions, sensor readings, log files, social media posts. It just keeps coming. But raw data sitting in a pile is useless. The real magic happens when someone builds the pipes that move it, clean it, reshape it, and deliver it to the people who need it.

Data engineering

About

About BookGrill.net

Category

Tags View all tags