R for Python Developers - Getting Started with RStudio and Tibbles
Chapter 2 is where the book gets hands-on. Rick Scavetta takes the wheel and walks Python developers through R. Not from scratch, but with the assumption you already know how to code. The chapter is big, so I split it into two posts. This is the first half.
Setting Up R and RStudio
You have two options. RStudio Cloud in the browser, or install R and RStudio Desktop locally. Either works. Fun fact: every R release gets a name from Peanuts. Charlie Brown and Snoopy, the whole crew.
Here’s the thing about RStudio that Python developers will notice right away. It shows your data objects in a panel. You can click to inspect data, even get an Excel-like viewer. Coming from VS Code, that feels almost too friendly.
But here’s the problem. That GUI convenience can lead to bad habits. You can click “Import Dataset” instead of writing the import command. The file loads, but the command never appears in your script. Your project stops being reproducible. Scavetta warns about this, and rightly so.
Worth noting: RStudio is not R. They are separate things. You can run R with Emacs or any other editor.
R Projects and the Working Directory
When you create an R project, you get a .Rproj file in your directory. That file sets your working directory. So when you write read.csv("data/diamonds.csv"), R knows where to look.
If you skip the project setup, your working directory defaults to your home folder. Then you end up hardcoding full paths, which is terrible for the same reasons it’s terrible in Python. The book says: do not use setwd() or getwd(). They show up in outdated tutorials. Avoid them.
Packages: CRAN and the Tidyverse
Run install.packages("tidyverse") in the console. There is no pip equivalent. Packages come from CRAN, the official R package repository with quality control and mirror servers worldwide.
One difference from Python: R users typically install packages system-wide, not inside virtual environments. The renv package exists for project-specific libraries, but it’s not as common as Python’s venv or conda.
After installing, you load a package with library(readr). That’s like import readr in Python. But you never see aliasing like import pandas as pd. That concept doesn’t exist in R.
You can also grab a single function without loading the whole package using the :: operator. So readr::read_csv() is similar to from pandas import read_csv. Use library(), not require(). The latter is meant for testing if a package exists, not everyday loading.
Tibbles: The Better Data Frame
So here’s what happened. When you read a CSV with base R’s read.csv(), you get a data.frame. When you read it with the Tidyverse’s read_csv(), you get a tibble.
They look very different when printed. A regular data frame dumps everything to the console. Thousands of rows, no mercy. A tibble shows the first 10 rows, column names, data types in angle brackets, and a note about how much you’re not seeing. It adjusts to your console width.
Under the hood, a tibble is still a data frame with extra classes on top for better defaults. This is OOP working in the background, picking the right print method based on the object’s class.
One thing that will surprise Python developers: R indexing starts at 1. Not 0. It will trip you up more than once.
Types and Exploring Data
R has five common data structures: vectors, lists, data frames, matrices, and arrays. A data frame is a collection of same-length vectors, where each vector is a column and all elements in it share one type.
The four main atomic types are logical (TRUE/FALSE), integer, double (floating point), and character (strings). R picks the lowest type that fits your data. Give it c(1, 2) and it defaults to double, not integer. Want an integer? Append L, like 1L.
One confusing overlap: Python’s str type is called character in R. But str() in R is a function that shows the structure of an object. Completely different thing.
To inspect data, use str() from base R or glimpse() from dplyr. Both show column names and types. Glimpse gives cleaner output, which is the Tidyverse pattern throughout.
Naming Conventions
Modern R uses snake_case. But that’s a recent trend. Older code is all over the place, and there is no strict equivalent to Python’s PEP 8. Hadley Wickham’s “Advanced R” offers style guidance, but the community isn’t strict about enforcement.
Here’s where it gets interesting with column names. Base R data frames replace illegal characters (spaces, parentheses, operators) with dots. A column named “Weight (g)” becomes Weight..g.. A name starting with a number gets an X prefix. So “5-day check” becomes X5.day.check.
Tibbles keep illegal characters but require backticks to reference them, like `Weight (g)`. More readable, but it means base R and Tidyverse code behave differently with the same data. Scavetta flags this as a real concern when mixing styles.
Even default names for headerless columns differ. Base R uses V1, V2, V3. Tidyverse uses X1, X2, X3. Small thing, but it matters when you inherit someone else’s script.
What’s Next
The first half of Chapter 2 sets the foundation: setting up R, installing packages, reading data, and understanding the type system. Next post covers the second half: data wrangling with the pipe operator and functions.
Previous: Chapter 1 - Origins of Python and R | Next: Chapter 2 Part 2 - R Data Wrangling