The Origin Stories of Python and R - Chapter 1 Retelling

Chapter 1 is titled “In the Beginning” and it’s written by Rick Scavetta. He opens with a tongue-in-cheek Dickens reference, saying it’s just the best of times for data science. But to understand where we are, we need to look at where Python and R came from. Their origin stories explain why they feel so different today.

R Started as a Tool by Statisticians, for Statisticians

Scavetta compares R to the 90s streetwear brand FUBU: For Us, By Us. That comparison is surprisingly perfect. R was built by statisticians who wanted a language that did exactly what they needed. No more, no less.

Here’s the thing. R didn’t appear out of nowhere. It traces directly back to the S language, which started at Bell Laboratories in 1976. John Chambers led the development of S, and his colleagues, including John Tukey and William Cleveland, published foundational books on computational statistics and data visualization. These weren’t random academics. Tukey invented the box plot. Cleveland developed the LOESS smoothing method. They cared about exploring data and communicating results clearly.

S was originally free and ran on Unix. Then it got licensed as S-PLUS, which cost money. So in 1991, Ross Ihaka and Robert Gentleman at the University of Auckland created an open source version. They called it R, after their first initials and as a play on the name S. The first stable release, R v1.0.0, came out on February 29, 2000.

Two things happened in between that mattered a lot. CRAN (the Comprehensive R Archive Network) was set up to host R packages. And the R Core Team was formed, a group of volunteers who maintain the language itself. Some original members, including Chambers, Ihaka, and Gentleman, are still involved.

Python Was Built to Be Easy and General Purpose

Meanwhile, in 1991 (same year Ihaka and Gentleman started on R), Guido van Rossum released Python. But here’s the problem with comparing them directly: Python was never about data analysis. Not at first.

Python was created as a high-level alternative to C and C++. Van Rossum wanted a language that was easy to learn, easy to read, and useful for all kinds of programming. He succeeded so well that he was called the “benevolent dictator for life” (BDFL) until he stepped down from Python’s Steering Council in 2018.

If R is FUBU, Scavetta says Python is a Swiss Army knife. It shows up everywhere: web development, system administration, gaming, desktop apps. Data science was just one more thing Python could do, and it came to it relatively late.

Python’s path into data science went through a few key packages. NumPy arrived in 2005, giving Python proper array handling. SciPy provided core algorithms for scientific computing. Then pandas came along in 2009 with data frames and data manipulation tools. This stack, sometimes called the PyData stack, is what made Python a real contender in the data science world.

Scavetta makes an interesting point here. Python got into data science partly because it was already everywhere else. If your colleagues in web development and system administration already used Python, sharing scripts with them was easy. That kind of existing adoption gave Python a head start that a niche language like R could never match.

The Language War Heats Up

The early 2000s set the stage for what people started calling the “language wars.” Four milestones stand out.

First, BioConductor launched in 2002 as an R package repository for biological data. It became so important to bioinformatics that it largely replaced Perl in that field.

Second, IPython was released in 2006, introducing interactive notebook-style programming to Python. This eventually became the Jupyter Project in 2014. Fun fact that many people forget: Jupyter stands for Julia, Python, and R. But it’s very Python-centric in practice.

Third, in 2007 Hadley Wickham published his PhD thesis, which included two R packages that would reshape the entire R ecosystem. One was reshape, which eventually led to the Tidyverse. The other was ggplot2, an implementation of “The Grammar of Graphics” that made plotting in R dramatically simpler.

Fourth, Python v3 came out in 2008. It was backward-incompatible with v2, and that caused years of confusion about which version to use. Python 2 was finally retired in 2020, though you could still buy a MacBook with it preinstalled after that date.

The Battle for Dominance

By the early 2010s, both languages had serious data science capabilities. Python got scikit-learn in 2011 for machine learning, then TensorFlow and Keras in 2016 for deep learning. Python’s strength as a high-level language sitting on top of high-performance backends (C++, GPU computing) made it the go-to choice for model deployment.

R got RStudio IDE in 2011, which became so central to the R experience that using R basically meant using RStudio. And the Tidyverse kept growing. What started as “the Hadleyverse” (named after Wickham) got officially rebranded at the useR! 2016 conference at Stanford. The Tidyverse standardized how R functions work together, creating a consistent, pipe-based workflow.

Scavetta points out something important here. R effectively has multiple “dialects”: base R, the Tidyverse, and BioConductor. Some R users only learn the Tidyverse way, which causes confusion when they encounter the massive amount of base R code that’s still in active use. Python has a similar split between vanilla Python and the PyData stack, but it causes less friction in practice.

From War to Cooperation

So here’s what happened. For a while, people expected one language to win and the other to disappear. Some data scientists actually wanted that. Others tried to make the languages mimic each other, porting workflows so the choice wouldn’t matter.

Neither of those things happened. Instead, the community realized something simple: both languages have unique strengths, and trying to make one copy the other misses the point.

Today, the Python and R communities have largely converged on cooperation. The goal is bilingual data scientists who can use the right tool for each job. That’s what this book is about, and the rest of it will show you how to get there.

One last fun detail from Scavetta. Python users call themselves Pythonistas, which is a great name. R users call themselves useRs. The official annual conference is called useR! (exclamation obligatory). It’s a little less cool, but that’s what happens when your language is a single letter.

Previous: The Preface - What Modern Data Science Means | Next: Chapter 2 Part 1 - R for Pythonistas

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More