Data Science Foundations Chapter 10 Part 2: Time Series, Classification, and Clustering Models
This is Part 2 of 2 for Chapter 10. In Part 1 we covered how to pick the right model and looked at regression. Now we get into the rest: time series, classification, clustering, and association analysis.
After working with data in IT for over two decades, I can tell you that these four areas cover maybe 80% of real business problems. So let us walk through them.
Time Series: Predicting What Comes Next
Time series analysis is about forecasting. You have data over time and you want to guess what comes next. Stock prices, website traffic, monthly sales.
Real-world data is messy. It has trends (things going up or down). It has seasonality (ice cream sells more in summer, every summer). And it has noise, random stuff you cannot explain.
The book focuses on ARIMA. Autoregressive Integrated Moving Average. Sounds scary. But break it into three pieces and it makes sense.
AR (Autoregressive) means using past values to predict future ones. If your sales went up by 30 last week, maybe they go up by something similar this week.
I (Integrated) means looking at changes between values instead of raw numbers. Instead of “we sold 630 phones,” you look at “+30 from last week.” This removes the trend and makes the data easier to work with.
MA (Moving Average) means adjusting for past mistakes. If the model was off by 10 phones last time, it corrects itself.
The book walks through a phone sales example. Seven weeks of data, sales going from 500 to 630. The model differences the data, adjusts for errors, and predicts Week 8 at 650 phones.
One thing the authors flag: context matters. The Covid pandemic destroyed many time series models because patterns changed completely. If you blindly feed data into ARIMA without thinking about real-world events, your predictions will be garbage.
Classification: Sorting Things Into Boxes
Classification is supervised learning. You give the model labeled data and it learns to sort new items into categories. Spam or not spam. Fraud or legitimate. Buy or not buy.
The book lists several algorithms: logistic regression, decision trees, random forests, support vector machines. Each has trade-offs. But the authors spend the most time on decision trees, and for good reason. They are the most intuitive model. You can draw one on a whiteboard and a non-technical person will understand it.
A decision tree splits data at each step to make groups as “pure” as possible. The math uses entropy (measuring uncertainty) and information gain (how much a split reduces that uncertainty).
The phone purchase example is good. Customer data includes budget, brand preference, screen size, camera quality. The tree first splits on budget because that separates buyers from non-buyers best. High budget? Almost always buy. Low budget? Almost never. Medium budget? Split again on brand preference. The result is a flowchart anyone can follow.
But here is the thing. Decision trees learn from training data. If your data has bias, your tree has bias. Garbage in, garbage out.
Real-world uses: spam filters, fraud detection, product recommendations. Classification is everywhere.
Clustering: Finding Groups Nobody Told You About
Clustering is unsupervised learning. No labels. No right answers. You throw data at the algorithm and it finds groups of similar items.
The book covers K-means, hierarchical clustering, and DBSCAN. K-means gets the most attention.
K-means is straightforward. Pick a number K (how many groups). The algorithm randomly places K center points called centroids. It assigns every data point to the nearest centroid. Recalculates centroids as the average position of all points in each group. Repeat until nothing moves.
The student example works well. Six students with height and weight data, K=2. After two rounds of assigning and recalculating, three shorter/lighter students land in one group, three taller/heavier in the other. Convergence.
But choosing K is tricky. The math might say five clusters is optimal. Can your business actually work with five customer segments? Maybe three makes more sense. Domain knowledge beats pure math.
Business uses: customer segmentation, document grouping, anomaly detection. Social media platforms use clustering to recommend friends and groups.
Association: What Gets Bought Together
Association rule mining is about finding patterns in transactions. The classic example is market basket analysis. People who buy bread often buy milk. People who buy chips often buy salsa.
The tool here is the Apriori algorithm. It uses three key measures:
Support is how often an item set appears. Battery packs in 4 out of 5 transactions? Support is 80%.
Confidence is probability. Someone buys a battery pack, what is the chance they also buy a case? In the book’s example, 75%.
Lift tells you if the association is real. Above 1 means items complement each other. Below 1 means they substitute. Exactly 1 means no relationship.
The book’s phone shop example is practical. Five transactions with battery packs, cases, and screen protectors. You calculate support, confidence, and lift for each pair. Then you use those numbers to decide product placement in your store.
One smart insight from the authors: if everyone already buys a case with a phone, that rule is obvious and not useful. Focus on patterns you did not know about. Domain expertise helps you filter out the noise and find genuinely useful associations.
Key Takeaways
- ARIMA handles time-based predictions by combining past values, trend removal, and error correction
- Always consider real-world context with time series, unusual events can break your model
- Decision trees are the most explainable classification method but watch for bias in training data
- K-means clustering is simple but choosing the right number of clusters requires business judgment
- Association rules find hidden buying patterns, but domain knowledge is essential to separate useful insights from obvious ones
Four model families. Four different problems they solve. The book does a good job of showing not just the math but the thinking behind when and why you pick each one.
Previous: Chapter 10 Part 1: Picking the Right Model Next: Chapter 11: Visualisations