Real-Time Visualization with Elasticsearch and Kibana
Trino is great for querying your historical data on S3, but for real-time streams and text-heavy search, you need something different. In the second half of Chapter 9, Neylson Crepalde introduces the industry standard for real-time analytics: Elasticsearch and Kibana.
Why Elasticsearch?
While a SQL engine like Trino is perfect for structured joins, Elasticsearch is a search engine built on top of Lucene. It’s designed to index massive amounts of semi-structured data (like JSON logs) and make them searchable in near real-time.
In a big data pipeline, we often stream data from Kafka directly into Elasticsearch. This allows us to see what’s happening in our system as it happens.
Deploying the Stack on Kubernetes
Running Elasticsearch on Kubernetes can be complex because it’s a stateful application. The book recommends using Helm charts to manage the deployment. You need:
- Elasticsearch: The engine that indexes and stores your data.
- Kibana: The “window” into your data. It’s a powerful UI for building dashboards.
One of the best things about running this on Kubernetes is that you can scale your Elasticsearch nodes independently. If you need more indexing power, just add more data nodes.
Visualizing with Kibana
Once your data is indexed, Kibana allows you to build incredible visualizations without writing a single line of code. You can create:
- Time-series charts to monitor traffic spikes.
- Pie charts to see the distribution of user agents.
- Maps to visualize where your requests are coming from.
The combination of Elasticsearch and Kibana provides that “wow” factor for business stakeholders. It turns raw, messy event streams into beautiful, actionable dashboards.
The Unified Consumption Layer
By the end of Chapter 9, you have a complete consumption layer:
- Trino for deep, historical SQL analysis on your data lake.
- Elasticsearch + Kibana for fast, real-time search and visualization.
You’ve built the plumbing, the engine, and the dashboard. In the next chapter, we’re going to pull everything we’ve learned together to build a complete, end-to-end big data pipeline.
Next: Building an End-to-End Big Data Pipeline - Part 1 Previous: The Data Consumption Layer - Querying with Trino
Book Details:
- Title: Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
- Author: Neylson Crepalde
- ISBN: 978-1-83546-214-0