Cloud Computing for Big Data: An Introduction

Jan 19, 2019
Big Data

Previous: Visualizing Big Data: Turning Numbers into Insight

We’ve spent this entire series talking about how to set up and run your own Hadoop cluster. But let’s be real: managing hardware is a pain. You have to buy servers, set up networking, worry about power outages, and pray that your hard drives don’t fail.

That’s why Chapter 11 of Sridhar Alla’s book is so important. It introduces Cloud Computing - the art of letting someone else (like Amazon or Google) deal with the hardware so you can focus on the data.

What is the Cloud, Really?

In simple terms, “The Cloud” is just someone else’s computer that you access over the internet. But for big data, it’s more than that. It’s a distinct IT environment designed to give you scalable, measured resources on demand.

The book breaks down a few key terms you need to know:

On-premise: This means the servers are sitting in your own office or data center.
Cloud Provider: The company that owns the hardware (e.g., AWS).
Cloud Consumer: That’s you-the person using the resources.

Scaling: Horizontal vs. Vertical

The cloud makes scaling easy. There are two ways to do it:

Vertical Scaling: Making a single server “bigger” (adding more RAM or a faster CPU). It’s easy but has limits.
Horizontal Scaling: Adding more servers to your cluster. This is the Hadoop way, and the cloud makes it as simple as clicking a button.

The Three Delivery Models

You’ve probably heard these acronyms before, but here’s what they actually mean for data:

IaaS (Infrastructure as a Service): You rent the raw “bricks”-virtual servers, storage, and networks. You’re responsible for the OS and everything on top of it.
PaaS (Platform as a Service): You get a pre-built environment to run your apps. You don’t worry about the OS; you just bring your code.
SaaS (Software as a Service): You just use the software over the web (like Gmail or Salesforce). You don’t manage anything technical.

Why Use the Cloud for Big Data?

The biggest benefit is elasticity. If you have a massive job that needs 1,000 servers for just two hours, you can spin them up, run the job, and shut them down. You only pay for what you use. Try doing that with on-premise hardware!

In the next post, we’ll compare the “Big Three” cloud providers and see which one is best for your big data projects.

Next: Comparing the Giants: AWS, Azure, and Google Cloud

#cloud-computing #iaas #paas #saas #aws #big-data

Cloud Computing for Big Data: An Introduction

What is the Cloud, Really?

Scaling: Horizontal vs. Vertical

The Three Delivery Models

Why Use the Cloud for Big Data?

About

About BookGrill.net

Category

Tags View all tags

Theme Settings

Accent Color