Real-Time Streaming with Apache Kafka - Part 2
Architecture is great, but let’s actually run some code. In the second half of Chapter 7, Neylson Crepalde walks us through setting up a multi-node Kafka cluster right on our local machine using Docker Compose.
If you’ve ever tried to install Kafka manually, you know it can be a pain (shoutout to Zookeeper configuration). Docker Compose makes it a breeze.
Spinning up the Cluster
The book provides a docker-compose.yaml that spins up three Kafka brokers and three Zookeeper nodes. This simulates a real production environment.
To get it running, you just need one command:
docker-compose up -d
Once the containers are up, you can jump into one of the brokers to start managing your cluster:
docker exec -it multinode-kafka-1-1 bash
Creating Your First Topic
Inside the container, we use the Kafka CLI tools. First, we create a topic named “mytopic” with 3 partitions and a replication factor of 3 (for high availability):
kafka-topics --create
--bootstrap-server localhost:19092
--replication-factor 3
--partitions 3
--topic mytopic
You can verify it by listing the topics:
kafka-topics --list --bootstrap-server localhost:19092
Producing and Consuming
Now for the fun part. You can open two terminal windows to see Kafka in action:
- The Producer: Use
kafka-console-producerto start typing messages into the topic. - The Consumer: Use
kafka-console-consumerto watch those messages appear in real-time.
It’s like a distributed, persistent chat room for your data.
Why this is huge for your stack
This hands-on exercise shows how Kafka becomes the “nervous system” of your data platform. You can have hundreds of different apps (producers) dumping events into Kafka, and hundreds of other apps (consumers) like Spark or custom Python scripts reading that data at their own pace.
Now that we’ve mastered Spark, Airflow, and Kafka individually, it’s time for the ultimate challenge: deploying the entire stack together on Kubernetes.
Next: Deploying the Big Data Stack on Kubernetes - Part 1 Previous: Real-Time Streaming with Apache Kafka - Part 1
Book Details:
- Title: Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
- Author: Neylson Crepalde
- ISBN: 978-1-83546-214-0