Using Python and R Together - Tools for Bilingual Data Science
Chapter 6 is where the book finally delivers on its promise. All that talk about using both languages together? This is where it actually happens. Rick Scavetta walks through the nuts and bolts of making Python and R talk to each other in the same project.
The Quick and Dirty Way: Faux Operability
Before getting to the real tools, the chapter starts with what Scavetta calls “faux operability.” It’s basically cross-talk between languages using files as the middle layer.
The idea is simple. Do some work in R, export a CSV, call a Python script from R using system() to process that file.
Here’s the thing. It works. Scavetta tells a story from his research lab days where he chained proprietary software, a Perl script, and an R script this way. One mouse click, half a second, done. Cell cultures were literally dying while waiting for results, so “elegant” was not the priority.
But here’s the problem. Three issues:
- You might not need two languages at all. Sometimes one language handles the whole pipeline.
- Too many moving parts. Multiple scripts, intermediate files, hardcoded paths. Each one a chance for breakage.
- Data format limitations. CSVs work for data frames, but complex objects don’t transfer through flat files.
For simple projects, faux operability is fine. For anything serious, you want real interoperability.
Real Interoperability: reticulate and rpy2
The chapter lays out two main tools, and which one you use depends on your starting language:
- reticulate (R package): You’re working in R and want to call Python. Developed by RStudio, well integrated into the IDE.
- rpy2 (Python module): You’re working in Python and want to call R.
The core idea is the same. Instead of writing files back and forth, you share objects directly in memory. An R data frame becomes a pandas DataFrame. A NumPy array becomes an R matrix. Conversion happens automatically.
Here’s what access looks like with reticulate:
| What you want | How you get it |
|---|---|
| Python object in R | py$objectName |
| R object in Python | r.objectName |
| Python function in R | pd <- import("pandas"); pd$read_csv() |
And with rpy2, going the other direction:
| What you want | How you get it |
|---|---|
| R package in Python | r_cluster = importr('cluster') |
| R object in Python | foo_py = robjects.r['foo_r'] |
| R functions in Python | import rpy2.robjects.lib.ggplot2 as ggplot2 |
The chapter focuses mostly on reticulate since this section was written by Scavetta, an R user. But the symmetry is clear. Same concept, different direction.
Setting Up reticulate
A good chunk of the chapter covers setup. Install reticulate, point R at your Python build, create a virtual environment with virtualenv_create(), install packages with virtualenv_install(), then restart R and activate with use_virtualenv(). Windows users need conda instead of virtualenv.
Scavetta is upfront that reticulate can be finicky. Version mismatches between RStudio, reticulate, and Python cause confusing errors. His advice: keep everything updated and restart R before activating environments. Not glamorous, but practical.
Passing Objects and Calling Functions
Once the setup is done, the chapter walks through several patterns for using Python from R:
Passing objects in R Markdown. R chunks and Python chunks live in the same document. Define a = 3.14 in Python, access it as py$a in R. Define b <- 42 in R, access it as r.b in Python.
Calling Python functions from R. Train a scikit-learn SVM in a Python chunk, then call py$clf$predict() from R. Python’s dot notation (iris.data) maps to R’s dollar notation (py$iris$data) automatically.
Sourcing scripts. Use source_python("script.py") to run a whole Python file and pull its objects into R. But watch out: Python objects mask R objects with the same name.
Shiny integration. The chapter’s best example combines a scikit-learn model with Shiny sliders. Users adjust feature values, Python predicts the classification in real time. R handles the UI, Python handles the model. Each language doing what it does best.
My Take
This chapter is the payoff for the whole book. After five chapters of context and history, you finally get the practical tools.
The faux operability section is more useful than it sounds. I’ve seen plenty of production pipelines that work exactly this way. Scripts calling scripts through shell commands. Not pretty, but knowing when “good enough” is truly good enough is a valuable skill.
The reticulate progression from object passing to a Shiny app with a Python ML model is well structured. You can see the pieces clicking together.
What’s missing is more coverage of rpy2. If you’re Python-first and want to call R, you’ll need to look elsewhere for hands-on details. Also missing: Apache Arrow and Feather as shared data formats. The book came out in 2021, so Arrow was still gaining traction, but it would have fit naturally here.
Still, Chapter 6 delivers on the core promise. Stop arguing about which language is better. Start using both.
Previous: Chapter 5 - Workflow Context | Next: Chapter 7 - Bilingual Case Study