Home » Books » Data Engineering with Python by Paul Crickard

Data Engineering with Python by Paul Crickard

Paul Crickard's hands-on guide to building data pipelines with Python, covering ETL, NiFi, Airflow, Kafka, Spark, and production deployment.

Data Engineering with Python walks you through the full data engineering stack using Python as the glue language. The book starts with fundamentals like reading files and working with databases, then progresses to building complete data pipelines with Apache NiFi and Apache Airflow. It covers data cleaning with pandas, monitoring with Elasticsearch and Kibana, and deployment strategies for production environments.

The second half shifts to streaming and big data. You set up a Kafka cluster, build streaming pipelines, process data with Apache Spark, and finish with a real-time edge computing project using MiNiFi. Three dedicated project chapters tie everything together with practical, end-to-end pipelines.

This book is for Python developers who want to understand data engineering from the ground up. It’s especially useful for beginners who want breadth across the tooling landscape rather than deep expertise in any single tool. The hands-on approach means you’re building real pipelines, not just reading about theory.

Published by Packt in 2020, some tooling details have aged (manual installs vs Docker, older Airflow patterns, ZooKeeper-based Kafka), but the core patterns of ETL, staging, validation, idempotency, and monitoring remain relevant and well-taught.

Feb 12, 2026
software-engineering

Data Engineering With Python: My Study Notes From Paul Crickard's Book

So I picked up Data Engineering with Python by Paul Crickard (Packt, 2020, ISBN: 978-1-83921-418-9) and decided to write up my study notes as I go through it. I’ve been working in IT for over 20 years, and data engineering keeps coming up everywhere. This book seemed like a good one to work through and share what I learn.

Feb 13, 2026
software-engineering

What Is Data Engineering? Study Notes From Data Engineering With Python Ch 1

Chapter 1 of Data Engineering with Python by Paul Crickard starts with the basics. What is data engineering? What do data engineers actually do? And how is it different from data science?

Feb 14, 2026
software-engineering

Building Your Data Engineering Setup - Study Notes From Data Engineering With Python Ch 2

Chapter 1 was all theory. Now it’s time to actually install stuff. Chapter 2 of Data Engineering with Python by Paul Crickard is a setup chapter. You install the tools, configure them, and make sure everything talks to each other.

Feb 15, 2026
software-engineering

Reading and Writing Files in Python - Study Notes From Data Engineering With Python Ch 3

Chapter 3 is where Crickard moves from setup to actual work. You installed all those tools in Chapter 2. Now you use them. The chapter covers one of the most fundamental tasks in data engineering: getting data out of text files and into something useful.

Feb 16, 2026
software-engineering

Working With Databases - Study Notes From Data Engineering With Python Ch 4

Most data pipelines start with a database. Most of them end with one too. Chapter 4 of Paul Crickard’s book is about connecting Python to databases and moving data between them. If the previous chapter was about flat files, this one is where things get real.

Feb 17, 2026
software-engineering

Cleaning and Transforming Data - Study Notes From Data Engineering With Python Ch 5

You can build the best pipeline in the world. You can read files, write to databases, schedule everything with Airflow. But if the data going through that pipeline is messy, none of it matters.

Feb 18, 2026
software-engineering

Building a 311 Data Pipeline - Study Notes From Data Engineering With Python Ch 6

The previous chapters taught you the individual tools. Python, NiFi, Airflow, databases, data cleaning. Chapter 6 of Data Engineering with Python by Paul Crickard puts them all together into one real project.

Feb 19, 2026
software-engineering

Production Pipeline Features - Study Notes From Data Engineering With Python Ch 7

You built a pipeline. It works on your machine. It runs on a schedule. Data goes in, data comes out. Ship it, right?

Feb 20, 2026
software-engineering

NiFi Registry Version Control - Study Notes From Data Engineering With Python Ch 8

You’ve been building data pipelines for several chapters now. They work. They move data. But here’s the problem: none of them have version control. If you break something, there’s no going back. Chapter 8 of Data Engineering with Python by Paul Crickard fixes that. It introduces the NiFi Registry, a sub-project of Apache NiFi that handles version control for your data pipelines.

Feb 21, 2026
software-engineering

Monitoring Data Pipelines - Study Notes From Data Engineering With Python Ch 9

You built a data pipeline. It is idempotent, uses atomic transactions, and has version control. It is production ready. But can you tell when it breaks?

Feb 22, 2026
software-engineering

Deploying Data Pipelines - Study Notes From Data Engineering With Python Ch 10

You built your data pipelines. They work on your laptop. Now what? Chapter 10 of Data Engineering with Python by Paul Crickard covers the part everyone eventually has to face: getting your pipelines out of development and into production.

Feb 23, 2026
software-engineering

Building a Production Data Pipeline - Study Notes From Data Engineering With Python Ch 11

You learned the individual tools. You learned the deployment strategies. Now Chapter 11 of Data Engineering with Python by Paul Crickard puts it all together. This is the chapter where you build a complete, production-grade data pipeline from start to finish.

Feb 24, 2026
software-engineering

Building a Kafka Cluster - Study Notes From Data Engineering With Python Ch 12

Up to this point in the book, everything has been batch processing. You query a database, get a full dataset, transform it, load it somewhere. The data sits still while you work on it.

Feb 25, 2026
software-engineering

Streaming Data With Apache Kafka - Study Notes From Data Engineering With Python Ch 13

Up to this point in the book, data pipelines have been about moving data that already exists. Query a database, read a file, process it, store it. The data sits still and you go get it.

Feb 26, 2026
software-engineering

Data Processing With Apache Spark - Study Notes From Data Engineering With Python Ch 14

You have streaming data. You have batch data. You have a lot of it. Now you need to actually process it. Fast. On more than one machine.

Feb 27, 2026
software-engineering

Real-Time Edge Data With MiNiFi and Spark - Study Notes From Data Engineering With Python Ch 15

You have NiFi running. Kafka is streaming. Spark is processing. But what about the data source? What happens when your data comes from a tiny sensor or a Raspberry Pi that can barely run a web browser?

Feb 28, 2026
software-engineering

Data Engineering With Python: Final Thoughts and Takeaways

That’s it. Fifteen chapters, seventeen posts, and one complete walkthrough of Paul Crickard’s Data Engineering with Python (Packt, 2020, ISBN: 978-1-83921-418-9).