Python for R Users - Versions, Virtual Environments, and Pandas

Chapter 2 showed Pythonistas how to pick up R. Chapter 3 flips the script. Now it’s the R user’s turn to step into Python territory. Rick Scavetta writes this one, and he does a good job easing R folks into a world that feels messier at first glance.

Here’s the thing: Python has more choices to make before you even write your first line of code. For someone used to “just install R from r-project.org,” that can feel overwhelming.

Which Python Do You Even Install?

In R, there’s basically one distribution. You go to r-project.org, download it, done. Python? You’ve got at least four options.

Your computer probably already has a system Python installed. Don’t touch it. The OS needs it for its own stuff.

Then there’s vanilla Python from python.org. At the time the book was written, version 3.8 or 3.9 was the move. There’s also Anaconda (and its smaller sibling Miniconda), which bundles Python with a pile of data science packages. And finally, you can skip local installs entirely and use Google Colab Notebooks in your browser.

Scavetta recommends vanilla Python for learning. Anaconda is popular, but the extra bells and whistles can distract you when you’re just getting started. Fair enough.

Virtual Environments: The Thing R Users Never Had

This is one of the biggest differences between R and Python workflows. Most R users have a single global installation of packages. You update a package, and suddenly an old script breaks because a function got deprecated or defaults changed. Sound familiar?

Pythonistas figured this out early. They use virtual environments. It’s a hidden folder called .venv inside your project directory. That folder contains the exact package versions your project needs. Each project gets its own isolated setup. No more “it works on my machine” problems.

Creating one is simple:

python3 -m venv .venv

Then activate it, install packages with pip, and you’re set. The key trick is pip freeze > requirements.txt. That saves every package and version to a file. Someone else can recreate your exact environment with pip install -r requirements.txt.

R eventually got renv to do something similar. But Python had this baked into the culture from day one.

VS Code, Notebooks, and Where to Write Code

R users have RStudio. Everyone uses it. Python never had that one dominant IDE. Text editors like VS Code became the preferred tool. Scavetta recommends VS Code and walks through the setup.

But here’s the problem: if local setup gives you trouble, there’s Jupyter Notebooks. They’re JSON-based documents that mix code, markdown, and output. Think R Markdown, but interactive by default. Google Colab gives you free Jupyter Notebooks in the browser with zero setup.

The book uses #%% markers in VS Code scripts to create executable chunks, which is similar to R Markdown chunks but simpler.

Python the Language: Methods, Attributes, and That Dot Notation

So here’s what happened when I first moved from R to Python. The dot notation tripped me up. In R, you call summary(data). In Python, you call data.describe(). The function belongs to the object. That’s OOP.

Scavetta introduces this distinction through importing packages. Python has keywords like import, as, and from:

import pandas as pd
import numpy as np
from scipy import stats

That pd.read_csv() call? It means “go to the pandas package and use its read_csv function.” Same idea as readr::read_csv() in R, just different syntax.

Once you load a dataset into a pandas DataFrame, you examine it with methods like .info(), .describe(), and .head(). And there are attributes too, like .shape and .columns, which don’t need parentheses because they’re properties, not function calls.

Data Structures: Lists, Dicts, Arrays, DataFrames

Python’s basic structures take adjustment for R users. Lists use [], dicts use {}, tuples use (). A pandas DataFrame is the Python equivalent of an R tibble. Each column is a Series, like how R data frame columns are vectors.

But here’s the problem with plain Python lists: they don’t support vectorization. Multiply a list by 2 and it just repeats the list. NumPy arrays fix this. They behave like R vectors:

dist = [584, 1054, 653]
dist * 2  # repeats: [584, 1054, 653, 584, 1054, 653]

dist_array = np.array(dist)
dist_array * 2  # element-wise: [1168, 2108, 1306]

That’s one of those moments where R users go “why isn’t this the default?” Welcome to Python.

Indexing Starts at Zero

This trips up every R user. Python indexes from 0. The first element is [0], not [1]. And when you slice with [start:end], the start is inclusive but the end is exclusive. So [:2] gives you elements at index 0 and 1.

For DataFrames, pandas gives you .iloc for integer-based indexing and .loc for label-based indexing with logical expressions. It’s more explicit than R’s [row, column] syntax, but it does the same job.

Plotting and Linear Models

Scavetta shows seaborn for quick plots and statsmodels for linear models. The modeling workflow splits into two steps: specify the model, then fit it.

model = ols("weight ~ group", plant_growth)
results = model.fit()
results.summary()

R users will recognize that formula syntax. The ~ works the same way. You can even run ANOVA and Tukey’s HSD, just from different packages.

The Takeaway

Python has more setup friction than R. More choices, more configuration. But virtual environments are a better practice than what most R users do. The OOP style takes getting used to, but it’s consistent.

The chapter ends with an honest note: R converged on common workflows around 2016. Python still has more diversity. That’s not a weakness. It’s just a reflection of Python being used in more contexts.

Next up, Part III, where things get practical.

Previous: Chapter 2 Part 2 - R Data Wrangling | Next: Chapter 4 - Data Format Context

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More