Welcome to Efficient R Programming {-}

knitr::include_graphics("figures/f0_web.png")

This is the online version of the O'Reilly book: Efficient R programming. Pull requests and general comments are welcome.

Get a hard copy from: Amazon (UK), Amazon (USA), O'Reilly

Authors {-}

Colin Gillespie is Senior Lecturer (Associate Professor) at Newcastle University, UK. He is an Executive Editor of the R Journal, with research interests including high performance statistical computing and Bayesian statistics. Colin founded the Jumping Rivers consultancy and has been teaching R since 2005 at all levels, from beginning to advanced programming.

Robin Lovelace is Associate Professor at the Institute for Transport Studies (ITS) and Leeds Institute for Data Analytics (LIDA), University of Leeds, UK. His research focuses on geocomputation and reproducible data science for evidence-based policy-making. Decarbonising the global economy while improving health and environmental outcomes is a major problem solving challenge. Robin's research supports solutions by generating evidence and tools enabling evidence-based investment in efficient and healthy modes of transport at local, city and national scales. Robin is the Lead Developer of the award-winning Propensity to Cycle Tool (publicly available at www.pct.bike), conveyor of the Transport Data Science module and workshop series, and co-author of popular packages, papers, and books, including Geocomputation with R.

Preface {-}

Efficient R Programming is about increasing the amount of work you can do with R in a given amount of time. It's about both computational and programmer efficiency. There are many excellent R resources about topic areas such as visualisation [e.g. @chang2012r], data science [e.g. @grolemund_r_2016] and package development [e.g. @Wickham_2015]. There are even more resources on how to use R in particular domains, including Bayesian Statistics, Machine Learning and Geographic Information Systems. However, there are very few unified resources on how to simply make R work effectively. Hints, tips and decades of community knowledge on the subject are scattered across hundreds of internet pages, email threads and discussion forums, making it challenging for R users to understand how to write efficient code.

In our teaching we have found that this issue applies to beginners and experienced users alike. Whether it's a question of understanding how to use R's vector objects to avoid for loops, knowing how to set-up your .Rprofile and .Renviron files or the ability to harness R's excellent C++ interface to do the 'heavy lifting', the concept of efficiency is key. The book aims to distill tips, warnings and 'tricks of the trade' down into a single, cohesive whole that will provide a useful resource to R programmers of all stripes for years to come.

The content of the book reflects the questions that our students, from a range of disciplines, skill levels and industries, have asked over the years to make their R work faster. How to set-up my system optimally for R programming work? How can one apply general principles from Computer Science (such as do not repeat yourself, DRY) to the specifics of an R script? How can R code be incorporated into an efficient workflow, including project inception, collaboration and write-up? And how can one learn quickly how to use new packages and functions?

The book answers each of these questions, and more, in 10 self-contained chapters. Each chapter starts simple and gets progressively more advanced, so there is something for everyone in each. While the more advanced topics such as parallel programming and C++ may not be immediately relevant to R beginners, the book helps to navigate R's famously steep learning curve with a commitment to starting slow and building on strong foundations. Thus even experienced R users are likely to find previously hidden gems of advice in the early parts of the chapters. "Why did no one tell me that before?" is a common exclamation we have heard while teaching this material.

Efficient programming should not be seen as an optional extra and the importance of efficiency grows with the size of projects and datasets. In fact, this book was devised while we were teaching a course on 'R for Big Data': it quickly became apparent that if you want to work with large datasets, your code must work efficiently. Even if you work with small datasets, efficient code, that is both fast to write and run is a vital component of successful R projects. We found that the concept of efficient programming is important to all branches of the R community. Whether you are a sporadic user of R (e.g. for its unbeatable range of statistical packages), looking to develop a package, or working on a large collaborative project in which efficiency is mission-critical, code efficiency will have a major impact on your productivity.

Ultimately efficiency is about getting more output for less work input. To take the analogy of a car, would you rather drive 1000 km on a single tank (or a single charge of your batteries) or refuel a heavy, clunky and ugly car every 50 km? In the same way, efficient R code is better than inefficient R code in almost every way: it is easier to read, write, run, share and maintain. This book cannot provide all the answers about how to produce such code but it certainly can provide ideas, example code and tips to make a start in the right direction of travel.



csgillespie/efficientR documentation built on Aug. 20, 2023, 2:30 a.m.