Real-Time Streaming with Apache Kafka - Part 2

Architecture is great, but let’s actually run some code. In the second half of Chapter 7, Neylson Crepalde walks us through setting up a multi-node Kafka cluster right on our local machine using Docker Compose.

If you’ve ever tried to install Kafka manually, you know it can be a pain (shoutout to Zookeeper configuration). Docker Compose makes it a breeze.

Spinning up the Cluster

The book provides a docker-compose.yaml that spins up three Kafka brokers and three Zookeeper nodes. This simulates a real production environment.

To get it running, you just need one command:

docker-compose up -d

Once the containers are up, you can jump into one of the brokers to start managing your cluster:

docker exec -it multinode-kafka-1-1 bash

Creating Your First Topic

Inside the container, we use the Kafka CLI tools. First, we create a topic named “mytopic” with 3 partitions and a replication factor of 3 (for high availability):

kafka-topics --create 
    --bootstrap-server localhost:19092 
    --replication-factor 3 
    --partitions 3 
    --topic mytopic

You can verify it by listing the topics:

kafka-topics --list --bootstrap-server localhost:19092

Producing and Consuming

Now for the fun part. You can open two terminal windows to see Kafka in action:

  1. The Producer: Use kafka-console-producer to start typing messages into the topic.
  2. The Consumer: Use kafka-console-consumer to watch those messages appear in real-time.

It’s like a distributed, persistent chat room for your data.

Why this is huge for your stack

This hands-on exercise shows how Kafka becomes the “nervous system” of your data platform. You can have hundreds of different apps (producers) dumping events into Kafka, and hundreds of other apps (consumers) like Spark or custom Python scripts reading that data at their own pace.

Now that we’ve mastered Spark, Airflow, and Kafka individually, it’s time for the ultimate challenge: deploying the entire stack together on Kubernetes.

Next: Deploying the Big Data Stack on Kubernetes - Part 1 Previous: Real-Time Streaming with Apache Kafka - Part 1

Book Details:

  • Title: Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
  • Author: Neylson Crepalde
  • ISBN: 978-1-83546-214-0

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More