Scaling to the Cloud with Amazon EKS

Testing things locally with Kind is great, but big data usually needs big iron. In this part of the hands-on journey, Neylson Crepalde shows us how to scale up to a managed cloud environment.

While the book touches on Google Cloud (GKE) and Azure (AKS), the main focus is on Amazon EKS (Elastic Kubernetes Service). Here is how you get a professional-grade cluster running without spending days on configuration.

The Cloud Setup

First, you need an AWS account. Once you have that, the secret weapon is a tool called eksctl. It’s basically a “one-command wonder” that handles all the heavy lifting of VPCs, subnets, and IAM roles for you.

To spin up a cluster named “studycluster” in Virginia, you’d run something like:

eksctl create cluster 
    --name=studycluster 
    --region=us-east-1 
    --instance-types=m6i.xlarge 
    --nodes-min=2 --nodes-max=4 
    --managed

It takes about 15 minutes, but when it’s done, you have a fully managed Kubernetes environment ready for production.

Deploying Your First API

Remember that Joke API we containerized earlier? Now we can push it to Docker Hub and deploy it to our new EKS cluster.

We create a Deployment to run two replicas of the API and a Service of type LoadBalancer. Within minutes, AWS provisions an Elastic Load Balancer (ELB), and you get a public DNS name. Paste that into your browser, add /joke, and you’ve just shipped your first cloud-native service.

Running the Data Job

But we are here for the data, right? Deploying the batch processing job is even simpler because it doesn’t need a public URL. You just define a Kubernetes Job:

apiVersion: batch/v1
kind: Job
metadata:
  name: dataprocessingjob
spec:
  template:
    spec:
      containers:
      - name: dataprocessingjob
        image: <your-username>/dataprocessingjob:v1
      restartPolicy: Never

After you apply this, you can watch the magic happen:

kubectl get jobs -n datajob
kubectl logs <pod-name> -n datajob

The Big Win

The coolest part? Whether you ran this on Kind or EKS, the Kubernetes manifests (the YAML files) remained exactly the same. That is the power of Kubernetes—it abstracts away the infrastructure so you can focus on the data logic.

Now that we have the plumbing sorted out, it’s time to talk about the actual data stack. In the next post, we’ll start exploring the Modern Data Stack architecture.

Next: The Modern Data Stack Architecture Previous: Local Kubernetes with Kind

Book Details:

  • Title: Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
  • Author: Neylson Crepalde
  • ISBN: 978-1-83546-214-0

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More