Data Science Foundations Chapter 3: How to Actually Deliver a Data Science Project
Chapter 2 was about stakeholders. Now Chapter 3 asks a very practical question: how do you actually get a data science project done?
And I like this chapter because it does not pretend everything is smooth. Data science projects are messy. They go back and forth. They change direction. The authors admit this openly, and then give you a framework to deal with it.
The Data Analysis Lifecycle
Every data science project, big or small, follows the same basic steps. Whether you are predicting profitability for a small business in one afternoon or running medical trials that take years with a full team, the activities are the same.
Mariadas and Huke lay out a lifecycle with these steps:
Discover - Figure out what problem you are solving. What resources do you need? Is the project even possible?
Source - Find the data. What is available? Where does it live?
Prepare - Clean the data. Remove what you do not need. Fix errors. Get everything into the right format. And here is the thing: this step usually takes the most time. If you have worked with real data, you know this already.
Explore - Pick your tools and methods. Try different approaches. Maybe build some quick prototypes to see what works.
Create - Build the actual model.
Analyse - Look at what the model gives you. Does it answer the question? Is it reliable?
Communicate - Share your findings with the people who asked the question.
Operationalise - If the project needs to run on an ongoing basis, embed it into the organization’s regular operations. But many data science projects are one-off investigations, so this step does not always happen.
The important part is the inner loop. This is not a straight line from start to finish. You can go back to any step at any time. And when you answer one question, new questions pop up. So the flow goes from communication right back to discovery again.
When Things Go Sideways
The authors use a phone fault prediction example to show how messy this gets in practice. You start sourcing data and discover it only exists for some phones. Back to the sponsor to confirm a smaller scope. You prepare the data and realize phone details live in a different dataset. Back to sourcing. Your chosen method needs phone makes stored as separate fields, not crammed into one column. Back to preparation. The model is weak. Back to exploring. You present results and someone asks a follow-up question. Back to analysis.
Five iterations in one simple project. I have worked on projects in IT where we went through this loop dozens of times.
The book also mentions “gates” between steps. Before moving forward, you pause and check if the current step is done enough. For small projects, a quick chat. For big ones, a formal review. I have seen too many projects where people rush into modeling because it is the fun part. Then the model fails because the data was not ready.
Managing the Whole Thing
Here is where project management comes in. The people paying for the project want to know three things: what will be delivered, when, and at what cost. And these three factors are connected. Change one, the others change too.
Data science adds an extra problem on top. Even when you deliver an answer, it might come with uncertainty. Or the answer might not be what the sponsor wanted to hear. That is just the nature of data work.
Waterfall - Plan everything upfront. Scope, timeline, budget all decided before you start. Changes go through formal change control. Works when the problem is well understood from day one.
Agile (Iterative) - Work in sprints of one or two weeks. Keep a backlog of work items. Each sprint picks from the backlog. Unfinished items go back. Flexible because the backlog can change as the project moves forward.
Hybrid - And this is what most real projects use. A high-level linear plan with detail managed through sprints. From my experience, hybrid is the most honest approach. You need a roadmap for stakeholders, but daily work needs flexibility because data projects are unpredictable.
Data Science Does Not Live Alone
One thing that is easy to forget: data science is often just one part of a bigger project. The authors give an example of building a CV-writing app. You need data science to understand what a good CV looks like. But you also need developers for the interface and a marketing team to launch it. Your project sits inside a bigger picture.
Practical Tips from the Chapter
The authors close with four simple tips:
- Plan your work. The data analysis lifecycle helps with this.
- Allow time to go back to previous steps. It will happen.
- Use planning to manage stakeholder expectations. If they understand the process is iterative, they will not panic when you revisit earlier steps.
- Think about how your project fits with other initiatives.
Simple advice. But I have seen enough projects fail because people ignored exactly these four points.
Good chapter. Short and honest about how data science projects really work.
Previous: Chapter 2: Stakeholders Next: Chapter 4: Ethics and Lawfulness