Data Science Foundations Chapter 1: What Is Data Science Really About?
You probably watched Moneyball. Brad Pitt, baseball, small team beats the rich guys using numbers. Good movie. But what the movie really shows is something bigger. It shows what happens when you take data seriously. And that is basically what data science is about.
Stephen Mariadas and Ian Huke open their book Data Science Foundations with this exact example. Smart choice. Because most people think data science is some abstract thing for nerds in hoodies. But it is everywhere. Your Netflix recommendations? Data science. The fact that certain shows even get made? Also data science. Spotify suggesting that random playlist you actually liked? Same thing.
So What Is Data Science Anyway?
Here’s the thing. Even the experts cannot agree on one definition. The authors admit this right away, which I respect. No pretending this is a clean, simple field.
But they land on something useful. Data science sits at the intersection of a few things: math and statistics, technology, and domain knowledge. You need all three. Having great math skills means nothing if you do not understand the business problem. And knowing the business perfectly does not help if you cannot work with the data.
The authors give their own definition: “using analytical methods to gain insight from diverse data.” They even admit this sounds boring. So they offer a better one: “the art of finding patterns in data.”
I like this version more. And they call it art for a reason. Data science is not just running formulas. It involves creative thinking about what questions to ask, how to prepare data, how to show results. That creative part is something textbooks often skip.
Three Types of Analytics
The book breaks down data analysis into three categories. This is a classic framework, and it is useful if you are new to the field.
Descriptive analytics is about what already happened. How many goals did a player score last season? How many users signed up this month? You are just describing reality with numbers.
Predictive analytics is about what might happen next. Based on past seasons, how many goals will this player score? Based on growth trends, how many users will sign up next quarter? You are making educated guesses.
Prescriptive analytics goes further. It tells you what to do about it. Pick this player for the team because the data says he will score more. Change your marketing strategy because the model says this approach works better.
The authors use football (soccer, for my American friends) to explain all three. Simple and clear. I have seen other books make this way more complicated than it needs to be.
But they also add an important point. Real problems do not always fit neatly into one category. The world is messy. Data science cuts across all three types, and sometimes your analysis lives somewhere in between.
The Iceberg Problem
One thing I really liked in this chapter is the iceberg comparison. Most people only see the tip of data science. They see the nice charts, the predictions, the recommendations. But underneath? There is a huge amount of work.
Data cleaning. Data preparation. Testing different models. Debugging. Starting over because your first approach was wrong. For every pretty visualization that reaches a stakeholder, dozens of steps happened behind the scenes. If you want to get into data science, this is something to understand early.
Ethics Show Up Early
The authors do something smart here. They bring up ethics in the very first chapter. They mention the tobacco industry. Back in the 1950s, data already showed that smoking causes cancer. But the tobacco companies used lawyers and PR to bury this. Data can be powerful, but it can also be misused.
This sets the tone for the whole book. Data science is not just about technique. It is about responsibility. Who sees the results? Who benefits? Who gets hurt? These are questions that matter.
Tools Are Not the Focus
Another thing I appreciate: the authors say upfront that they will not focus on specific software tools. No “use this Python library” or “install R.” Instead, they teach methods. You pick the tools.
This makes the book age better. Tools change every few years. Methods and thinking frameworks last much longer. Whether you end up using Python, Excel, or some specialized platform, the foundations stay the same.
My Take
For an introduction chapter, this one does its job well. It is short. It sets expectations. It does not overpromise.
The authors position data science as a multidisciplinary field, which it is. They are honest about the fact that definitions are fuzzy. And they make it clear that this book is about foundations, not about becoming an expert overnight.
If you are picking up this book, you are probably new to data science or want a structured overview. This chapter tells you exactly what you are getting into. And the practical tips at the end are simple but honest: you will always be learning, you need many different skills, and when someone talks about data science, make sure you both mean the same thing.
That last point matters more than you think.
Previous: Book Retelling Intro Next: Chapter 2: Stakeholders