Data Science Foundations Chapter 6: Understanding Data Properties and Types

Chapter 6 of Data Science Foundations by Stephen Mariadas and Ian Huke is about something that sounds boring but really is not. Properties of data. What kind of data are you working with? And why does it matter so much?

Here’s the thing. Most people think data means numbers. Spreadsheets full of percentages. KPIs. Targets. But the authors make a strong point early on. Words and feelings can be just as valuable as numbers. Sometimes more.

Numbers Are Not Everything

The chapter opens with Sears. The American retail giant. For years their financial numbers looked solid. Revenue was up. Market presence was strong. All the KPIs said things were fine.

But customers were unhappy. Bad service, old stores, products not available. The numbers in the boardroom told one story. The words from real customers told a completely different one.

By 2018 Sears filed for bankruptcy. Their spreadsheets could not save them because they ignored what people were actually saying.

I saw this pattern many times in IT. Management loves dashboards with green numbers. But sometimes the most important signal is a frustrated user typing an angry support ticket.

Two Big Categories: Quantitative and Qualitative

The book breaks data into two main types.

Quantitative data is about amounts. Numbers. How many, how much, what percentage. This is what most people are trained to work with. We love turning everything into numbers because it makes things feel precise and clean.

Qualitative data is about descriptions and feelings. Which cake tastes better? How do you feel about your internet provider? The answers come as words, not digits. And that makes them harder to work with. But not less useful.

From my own experience, when you get qualitative feedback, the first instinct is to turn it into numbers. “75% of users said good.” But you lose something in that conversion. The raw words often tell you more than the count.

NOIR: Four Types You Should Know

This was the part of the chapter that felt most like a textbook. But it is worth knowing. The authors introduce four data types called NOIR:

  • Nominal - categories with no order. Like colors or country names. Red is not “more” than blue.
  • Ordinal - categories with order but no exact distance between them. Like first place, second place, third place. You know the ranking but not how close the race was.
  • Interval - numbers with equal spacing but no true zero. Temperature in Celsius is a classic example. The difference between 10 and 20 degrees is the same as between 30 and 40. But zero does not mean “no temperature.”
  • Ratio - numbers with equal spacing and a true zero. Weight, height, money. Zero actually means nothing here.

Nominal and ordinal are qualitative. Interval and ratio are quantitative. Knowing which type you have changes what math you can do with it. You cannot average zip codes even though they look like numbers.

Bias and Skew: When Data Lies to You

This section hit close to home. The authors quote Mark Twain: “Lies, damn lies, and statistics.” And then they give some marketing examples that everyone has seen.

“Nine out of ten doctors recommend Tylenol.” Sounds impressive. But how many doctors did they ask? Where? When? What were the other options? The details behind a number matter more than the number itself.

The chapter explains a few important terms.

Population is everyone who could be included. All your customers. All people in a country.

Sample is the smaller group you actually study. Maybe 100 customers out of 100,000.

Sampling is how you pick that group. And this is where bias sneaks in.

If you ask young men whether they want more football on TV, most will say yes. That does not mean everyone wants more football. You just asked the wrong crowd.

The book covers three sampling approaches. Random sampling gives everyone an equal chance. Stratified sampling makes sure different groups are all represented. Convenience sampling means you just ask whoever is easy to reach, like your friends. Guess which one gives the worst results.

Skew means data leans to one side. If a company has mostly young employees, the age distribution is skewed. Bias is when your method of collecting data produces results that do not represent reality.

Both can lead you to wrong conclusions. And both are more common than people think.

Doing Research Right

The last section gives practical advice. For quantitative work: collect existing data, run basic statistics, design surveys with clear questions and big enough samples. For qualitative work: do interviews, run focus groups, observe how things actually work instead of just asking.

The best results come from mixing both approaches. Use interviews to explain patterns you found in numbers. Let qualitative findings guide your survey questions.

My Take

After 20 years in IT I can tell you most bad decisions come from looking at the wrong type of data. Or looking at the right data in the wrong way.

The NOIR classification seems academic. But once you get it, you notice when people average things that should not be averaged. It happens all the time.

And the part about bias is worth reading twice. Your sample decides your answer. If the sample is bad, no fancy analysis will save you.


This post is part of a chapter-by-chapter retelling of “Data Science Foundations: Navigating Digital Insight” by Stephen Mariadas and Ian Huke. My thoughts and interpretations, not a copy of the book.

Previous: Chapter 5: Discovery Next: Chapter 7: Sourcing Data

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More