Data Engineering with Python: My Study Notes from Paul Crickard's Book

Feb 12, 2026
software-engineering

So I picked up Data Engineering with Python by Paul Crickard (Packt, 2020, ISBN: 978-1-83921-418-9) and decided to write up my study notes as I go through it. I’ve been working in IT for over 20 years, and data engineering keeps coming up everywhere. This book seemed like a good one to work through and share what I learn.

What This Book Covers

The book is about building data pipelines using Python. But it’s not just Python scripts. Crickard walks through the full stack of tools you’d actually use in a real data engineering setup:

Apache NiFi for moving data around
Apache Airflow for scheduling and orchestrating pipelines
Apache Kafka for streaming data
Apache Spark for processing big data
Databases like Elasticsearch, PostgreSQL, and MongoDB

It’s 15 chapters that take you from “what even is data engineering” all the way to building real-time data pipelines with edge computing.

Why I’m Writing This Up

I read a lot of books. And honestly, the best way to remember what you read is to explain it to someone else. So that’s what this series is. My notes, my takeaways, and sometimes my opinions on what works and what could be better.

I’m going to keep things simple. No jargon walls. If you’re new to data engineering or just curious about Python’s role in it, these notes should give you a solid overview without having to read all 300+ pages.

The Book Structure

The book breaks down into three sections:

Section 1 - Building Data Pipelines (Chapters 1-6): The basics. What data engineering is, setting up your tools, reading files, working with databases, cleaning data, and your first real pipeline project.

Section 2 - Running Pipelines in Production (Chapters 7-11): Making things production-ready. Version control, monitoring, deployment, and a full production pipeline project.

Section 3 - Beyond Batch (Chapters 12-15): Streaming and real-time processing. Kafka clusters, streaming data, Spark processing, and a final project combining everything.

What to Expect

One chapter per post. I’ll cover the main ideas, share what I found useful, and flag anything that felt outdated or could use more explanation. The book came out in 2020, so some things have changed in the ecosystem since then.

Here’s the full series:

Let’s get started.

Next up: What is Data Engineering? (Ch 1)

#data-engineering-with-python #paul-crickard #book-retelling #data-engineering #python #data-pipelines

Data Engineering with Python: My Study Notes from Paul Crickard's Book

What This Book Covers

Why I’m Writing This Up

The Book Structure

What to Expect

About

About BookGrill.net

Category

Tags View all tags

Theme Settings

Accent Color