Python vs R Workflows - Machine Learning, Visualization, and More

Chapter 5 is where Boyan Angelov gets practical about the question everyone dances around: which language should you actually use for which job?

Here’s the thing. A data scientist who builds ML models and one who builds dashboards might share a job title but live in completely different worlds. One wonders about D3.js. The other has never deployed an API. Both feel like impostors sometimes. That’s not a personal failing. It’s just data science being absurdly wide as a field.

What Even Is a Workflow

Angelov defines a workflow as the complete set of tools and frameworks needed to get a specific job done. Not just one package. The whole chain from data to output.

He lays out a table mapping workflows to packages. Data munging: pandas vs dplyr. EDA: matplotlib/seaborn vs ggplot2. Machine learning: scikit-learn vs tidymodels/caret/mlr. Deep learning: Keras/TensorFlow/PyTorch in both. Data engineering: Flask/FastAPI vs plumber. Reporting: Jupyter vs R Markdown/Shiny.

A good workflow has three traits. Wide adoption. Open source backing. And enough modularity to plug into different tech stacks.

Visualization: R Wins This Round

For exploratory data analysis, the book gives R a clear edge. The reason has a name: ggplot2.

ggplot2 is built on the grammar of graphics, a framework by Leland Wilkinson that treats plots as stacks of layers. Axes and grids form one layer. Points and lines form another. You compose them together with +. The result is that a few lines of R code can produce publication-quality plots with titles, themes, trend lines, and color coding.

Python has matplotlib and seaborn, and both are capable. But the book is honest: they lag behind ggplot2 in ease and elegance. Attempts to port ggplot2 to Python (like the ggplot package) haven’t really caught on.

For interactive visualizations, R has Leaflet for maps and plotly for charts. There’s even a trick where you pass a ggplot2 plot to ggplotly() and it becomes interactive automatically. That’s just nice.

Machine Learning: Python Wins This Round

Here’s where Python takes the crown. And you can trace it back to one package: scikit-learn.

What makes scikit-learn so good? Consistency. Every algorithm follows the same pattern: import, create, fit(), predict(). Random forest, decision tree, linear regression. Same API shape every time. The documentation is also excellent, with real tutorials, not just reference pages.

The chapter shows a full ML workflow in a few lines. Split data with train_test_split. Fit a RandomForestClassifier. Predict. Check performance with the metrics module. Reads almost like pseudocode.

R has mlr and tidymodels working on this. But here’s the problem. They’re not as widely used, and when it’s time to hand your model to a data engineering team, showing up with an mlr model is going to get you some looks.

For deep learning specifically, both languages have access to TensorFlow and Keras. But PyTorch is native to Python, while R’s torch package wraps the C++ backend. The book recommends sticking with Python for deep learning unless you have an existing R codebase.

Data Engineering and Deployment: Python Again

Most ML projects fail not because the model is bad, but because there’s no infrastructure to put it into production. Python dominates here.

The chapter walks through building a prediction API with Flask. Train a model, serialize it with pickle, load it in a Flask app, expose an endpoint, query it with Postman. Done. Working ML API.

Python’s glue language nature is the key. It talks to web frameworks, databases, cloud services, container systems. R has plumber, but it’s not close in ecosystem support. The book also mentions FastAPI and BentoML as newer tools built specifically for ML deployment.

Reporting: R Takes It Back

For communicating results, R pulls ahead again. R Markdown lets you mix code, text, and output in one document. Write prose, drop in a code chunk, and results render inline. Knit it to PDF or HTML and you have a polished report. Python has Jupyter Notebooks, which work fine, but the chapter argues R Markdown in RStudio is a better experience for literate programming.

For interactive reporting, R has Shiny. The chapter builds a complete Shiny app where users filter Star Wars character data by height and weight, and a ggplot2 visualization updates in real time. No code touching required by the end user.

Python has Streamlit as a newer alternative, but the book notes it hadn’t yet gained the maturity of Shiny at the time of writing.

The Scorecard

The chapter’s conclusion is clean. R is king for EDA and reporting. Packages like ggplot2 are unmatched, and Shiny enables genuinely interactive data presentations. Python is king for machine learning and data engineering. scikit-learn’s consistent API and Python’s ability to glue together different software components make it the practical choice for production work.

Neither language wins everything. That’s the whole point of this book. Know both, use both, pick the right one for the task in front of you.

Previous: Chapter 4 - Data Format Context | Next: Chapter 6 - Using Both Languages Together

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More