Data Engineering With Python: Final Thoughts and Takeaways
That’s it. Fifteen chapters, seventeen posts, and one complete walkthrough of Paul Crickard’s Data Engineering with Python (Packt, 2020, ISBN: 978-1-83921-418-9).
That’s it. Fifteen chapters, seventeen posts, and one complete walkthrough of Paul Crickard’s Data Engineering with Python (Packt, 2020, ISBN: 978-1-83921-418-9).
In Part 1 we covered the theory behind CI/CD and ran through a basic Cloud Build exercise with a Python project. Unit tests ran automatically on every push. Broken code got caught before it reached production. Good stuff, but that was a simple calculator script. Now we need to connect this to real data engineering work.
You learned the individual tools. You learned the deployment strategies. Now Chapter 11 of Data Engineering with Python by Paul Crickard puts it all together. This is the chapter where you build a complete, production-grade data pipeline from start to finish.
You built a data pipeline. It is idempotent, uses atomic transactions, and has version control. It is production ready. But can you tell when it breaks?
This is Part 1 of Chapter 7. Part 2 covers orchestration and transformations.
Chapter 7 of Data Engineering for Beginners is probably where things start feeling real. You stop talking about storage and tables and start talking about how data actually moves. And the answer is: through pipelines.
You built a pipeline. It works on your machine. It runs on a schedule. Data goes in, data comes out. Ship it, right?
The previous chapters taught you the individual tools. Python, NiFi, Airflow, databases, data cleaning. Chapter 6 of Data Engineering with Python by Paul Crickard puts them all together into one real project.
Chapter 1 of Data Engineering with Python by Paul Crickard starts with the basics. What is data engineering? What do data engineers actually do? And how is it different from data science?
So I picked up Data Engineering with Python by Paul Crickard (Packt, 2020, ISBN: 978-1-83921-418-9) and decided to write up my study notes as I go through it. I’ve been working in IT for over 20 years, and data engineering keeps coming up everywhere. This book seemed like a good one to work through and share what I learn.
Up until now in the book, we built BigQuery tables by hand, wrote queries in the console, and loaded data manually. That works for learning, but nobody does that in production. In production, you need things to run on their own, on schedule, without you babysitting them at 5 AM.
This is post 8 in my Data Engineering with AWS retelling series.
You have learned about data engineering principles, data architectures, the AWS toolkit, and data governance. Now comes the part where it all comes together. Chapter 5 is about designing an actual data pipeline. Not writing code yet. Just thinking. Planning. Drawing on a whiteboard.