Scaling to the Cloud with Amazon EKS
Testing things locally with Kind is great, but big data usually needs big iron. In this part of the hands-on journey, Neylson Crepalde shows us how to scale up to a managed cloud environment.
While the book touches on Google Cloud (GKE) and Azure (AKS), the main focus is on Amazon EKS (Elastic Kubernetes Service). Here is how you get a professional-grade cluster running without spending days on configuration.
The Cloud Setup
First, you need an AWS account. Once you have that, the secret weapon is a tool called eksctl. It’s basically a “one-command wonder” that handles all the heavy lifting of VPCs, subnets, and IAM roles for you.
To spin up a cluster named “studycluster” in Virginia, you’d run something like:
eksctl create cluster
--name=studycluster
--region=us-east-1
--instance-types=m6i.xlarge
--nodes-min=2 --nodes-max=4
--managed
It takes about 15 minutes, but when it’s done, you have a fully managed Kubernetes environment ready for production.
Deploying Your First API
Remember that Joke API we containerized earlier? Now we can push it to Docker Hub and deploy it to our new EKS cluster.
We create a Deployment to run two replicas of the API and a Service of type LoadBalancer. Within minutes, AWS provisions an Elastic Load Balancer (ELB), and you get a public DNS name. Paste that into your browser, add /joke, and you’ve just shipped your first cloud-native service.
Running the Data Job
But we are here for the data, right? Deploying the batch processing job is even simpler because it doesn’t need a public URL. You just define a Kubernetes Job:
apiVersion: batch/v1
kind: Job
metadata:
name: dataprocessingjob
spec:
template:
spec:
containers:
- name: dataprocessingjob
image: <your-username>/dataprocessingjob:v1
restartPolicy: Never
After you apply this, you can watch the magic happen:
kubectl get jobs -n datajob
kubectl logs <pod-name> -n datajob
The Big Win
The coolest part? Whether you ran this on Kind or EKS, the Kubernetes manifests (the YAML files) remained exactly the same. That is the power of Kubernetes—it abstracts away the infrastructure so you can focus on the data logic.
Now that we have the plumbing sorted out, it’s time to talk about the actual data stack. In the next post, we’ll start exploring the Modern Data Stack architecture.
Next: The Modern Data Stack Architecture Previous: Local Kubernetes with Kind
Book Details:
- Title: Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
- Author: Neylson Crepalde
- ISBN: 978-1-83546-214-0