Apache airflow

Mar 25, 2026
Big Data

Building an End-to-End Big Data Pipeline - Part 2

In our last post, we checked the infrastructure. Now, let’s build the actual pipeline. Neylson Crepalde uses the IMDB dataset to demonstrate a professional batch workflow.

Mar 21, 2026
Big Data

Deploying the Big Data Stack on Kubernetes - Part 2

In my last post, we got Spark running natively on Kubernetes. Now, it’s time to bring in the conductor (Airflow) and the nervous system (Kafka). This is where your cluster starts to feel like a real data platform.

Mar 17, 2026
Big Data

Orchestrating Pipelines With Apache Airflow - Part 2

In the last post, we got Airflow running. Now, let’s talk about how to actually use it. The heart of Airflow is the DAG—the Directed Acyclic Graph.

Mar 16, 2026
Big Data

Orchestrating Pipelines With Apache Airflow - Part 1

If Spark is the engine, then Apache Airflow is the conductor. In a modern data stack, you rarely have just one job running in isolation. You have ingestion, cleaning, processing, and delivery—and they all have to happen in a specific order.

Mar 13, 2026
Big Data

The Tools of the Modern Data Stack

We’ve talked about the architecture, but what about the actual tools? To build a modern data lakehouse on Kubernetes, you need a specific set of tools that can handle scale, automation, and speed.

Feb 28, 2026
software-engineering

Data Engineering With Python: Final Thoughts and Takeaways

That’s it. Fifteen chapters, seventeen posts, and one complete walkthrough of Paul Crickard’s Data Engineering with Python (Packt, 2020, ISBN: 978-1-83921-418-9).

Feb 22, 2026
software-engineering

Deploying Data Pipelines - Study Notes From Data Engineering With Python Ch 10

You built your data pipelines. They work on your laptop. Now what? Chapter 10 of Data Engineering with Python by Paul Crickard covers the part everyone eventually has to face: getting your pipelines out of development and into production.

Feb 21, 2026
software-engineering

Pipeline Orchestration With Airflow, DAGs, and Data Transformations

This is Part 2 of Chapter 7, continuing from batch and streaming basics.

In Part 1, we covered how batch and streaming pipelines move data around. But here is the thing: having a pipeline is one thing. Making sure all its parts run in the right order, at the right time, without you babysitting it? That is orchestration. And this is where Chapter 7 gets really practical.

Feb 18, 2026
software-engineering

Building a 311 Data Pipeline - Study Notes From Data Engineering With Python Ch 6

The previous chapters taught you the individual tools. Python, NiFi, Airflow, databases, data cleaning. Chapter 6 of Data Engineering with Python by Paul Crickard puts them all together into one real project.

Feb 14, 2026
software-engineering

Building Your Data Engineering Setup - Study Notes From Data Engineering With Python Ch 2

Chapter 1 was all theory. Now it’s time to actually install stuff. Chapter 2 of Data Engineering with Python by Paul Crickard is a setup chapter. You install the tools, configure them, and make sure everything talks to each other.

Feb 12, 2026
software-engineering

Data Engineering With GCP Chapter 4 Part 2: Airflow Scheduling, Idempotency, and Sensors

In the first part we got Cloud Composer running, wrote our first DAGs, and learned operators. This second part covers the stuff that separates beginner Airflow code from production-ready pipelines: variables, idempotent tasks, backfilling, sensors, and dataset-driven scheduling.

Feb 11, 2026
software-engineering

Data Engineering With GCP Chapter 4 Part 1: Automating Data Workflows With Cloud Composer

Up until now in the book, we built BigQuery tables by hand, wrote queries in the console, and loaded data manually. That works for learning, but nobody does that in production. In production, you need things to run on their own, on schedule, without you babysitting them at 5 AM.

Apache airflow

Building an End-to-End Big Data Pipeline - Part 2

Deploying the Big Data Stack on Kubernetes - Part 2

Orchestrating Pipelines With Apache Airflow - Part 2

Orchestrating Pipelines With Apache Airflow - Part 1

The Tools of the Modern Data Stack

Data Engineering With Python: Final Thoughts and Takeaways

Deploying Data Pipelines - Study Notes From Data Engineering With Python Ch 10

Pipeline Orchestration With Airflow, DAGs, and Data Transformations

Building a 311 Data Pipeline - Study Notes From Data Engineering With Python Ch 6

Building Your Data Engineering Setup - Study Notes From Data Engineering With Python Ch 2

Data Engineering With GCP Chapter 4 Part 2: Airflow Scheduling, Idempotency, and Sensors

Data Engineering With GCP Chapter 4 Part 1: Automating Data Workflows With Cloud Composer

About

About BookGrill.net

Category

Tags View all tags

Theme Settings

Accent Color