set.seed(31415) options(digits = 3) knitr::opts_chunk$set( comment = "#>", collapse = TRUE, cache = FALSE, out.width = "70%", fig.align = 'center', fig.width = 6, fig.asp = 0.618, # 1 / phi fig.show = "hold" ) options(dplyr.print_min = 6, dplyr.print_max = 6) Sys.setenv(LANGUAGE = "en")
"When you write a program, think of it primarily as a work of literature. You're trying to write something that human beings are going to read. Don't think of it primarily as something a computer is going to follow. The more effective you are at making your program readable, the more effective it's going to be: You'll understand it today, you'll understand it next week, and your successors who are going to maintain and modify it will understand it."
-- Donald Knuth
Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. When I answer questions; first, I see if think I can answer the question, secondly, I check the coding style of the question and if the code is too difficult to read, I just move on. Please make your code readable by following e.g. this coding style (most examples below come from this guide).
"To become ssignificantly more reliable, code must become more transparent. In particular, nested conditions and loops must be viewed with great suspicion. Complicated control flows confuse programmers. Messy code often hides bugs."
--- Bjarne Stroustrup
In code, use comments to explain the “why” not the “what” or “how”. Each line of a comment should begin with the comment symbol and a single space: #
.
```{block2, type='rmdtip'} Use commented lines of - to break up your file into easily readable chunks and to create a code outline in RStudio
### Naming > There are only two hard things in Computer Science: cache invalidation and naming things. > > -- Phil Karlton Names are not limited to 8 characters as in some other languages, however they are case sensitive. Be smart with your naming; be descriptive yet concise. Think about how your names will show up in auto complete. Throughout the course we will point out some standard naming conventions that are used in R (and other languages). (Ex. `i` and `j` as row and column indices) ```r # Good average_height <- mean((feet / 12) + inches) plot(mtcars$disp, mtcars$mpg) # Bad ah<-mean(x/12+y) plot(mtcars[, 3], mtcars[, 1])
Put a space before and after =
when naming arguments in function calls.
Most infix operators (==
, +
, -
, <-
, etc.) are also surrounded by
spaces, except those with relatively high precedence: ^
, :
, ::
, and :::
. Always put a space after a comma, and never before (just like in regular English).
# Good average <- mean((feet / 12) + inches, na.rm = TRUE) sqrt(x^2 + y^2) x <- 1:10 base::sum # Bad average<-mean(feet/12+inches,na.rm=TRUE) sqrt(x ^ 2 + y ^ 2) x <- 1 : 10 base :: sum
Curly braces, {}
, define the the most important hierarchy of R code. To make this hierarchy easy to see, always indent the code inside {}
by two spaces.
# Good if (y < 0 && debug) { message("y is negative") } if (y == 0) { if (x > 0) { log(x) } else { message("x is negative or zero") } } else { y ^ x } # Bad if (y < 0 && debug) message("Y is negative") if (y == 0) { if (x > 0) { log(x) } else { message("x is negative or zero") } } else { y ^ x }
Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work into a separate function.
If a function call is too long to fit on a single line, use one line each for
the function name, each argument, and the closing )
. This makes the code
easier to read and to change later.
# Good do_something_very_complicated( something = "that", requires = many, arguments = "some of which may be long" ) # Bad do_something_very_complicated("that", requires, many, arguments, "some of which may be long"
<-
, not =
, for assignment. Keep =
for parameters.# Good x <- 5 system.time( x <- rnorm(1e6) ) # Bad x = 5 system.time( x = rnorm(1e6) )
Don't put ;
at the end of a line, and don't use ;
to put multiple commands
on one line.
Only use return()
for early returns. Otherwise rely on R to return the result of the last evaluated expression.
# Good add_two <- function(x, y) { x + y } # Bad add_two <- function(x, y) { return(x + y) }
"
, not '
, for quoting text. The only exception is when the text already contains double quotes and no single quotes.# Good "Text" 'Text with "quotes"' '<a href="http://style.tidyverse.org">A link</a>' # Bad 'Text' 'Text with "double" and \'single\' quotes'
Create variables for values that are likely to change.
Try not to copy code, or copy then modify the code, more than twice.
The Rule of Three applies to look-up tables and such also. The key thing to think about is; if something changes how many touch points will there be? If it is 3 or more places it is time to abstract this code a bit.
[^DRY]: This is sometimes called the DRY principle, or Don't Repeat Yourself.
It is better to use relative path names instead of hard coded ones. If you must read from (or write to) paths that are not in your project directory structure create a file name variable at the highest level you can (always end with the /
) and then use relative paths.
DO NOT EVER USE setwd()
# Good raw_data <- read.csv("./data/mydatafile.csv") input_file <- "./data/mydatafile.csv" raw_data <- read.csv(input_file) input_path <- "C:/Path/To/Some/other/project/directory/" input_file <- paste0(input_path, "data/mydatafile.csv") raw_data <- read.csv(input_file) # Bad setwd("C:/Path/To/Some/other/project/directory/data/") raw_data <- read.csv("mydatafile.csv") setwd("C:/Path/back/to/my/project/")
Download the latest version of RStudio (> 1.1) and use it!
Learn more about new features of RStudio v1.1 there.
RStudio features:
Use code diagnostics:
R Projects:
The only two things that make \@JennyBryan 😤😠🤯. Instead use projects + here::here() #rstats pic.twitter.com/GwxnHePL4n
— Hadley Wickham (\@hadleywickham) December 11 2017
Read more at https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ and also see chapter Efficient set-up of book Efficient R programming.
A basic solution is to print everything, but it usually does not work well on complex problems. A convenient solution to see all the variables' states in your code is to place some browser()
anywhere you want to check the variables' states.
Learn more with this book chapter, this other book chapter, this webinar and this RStudio article.
Can't remember useful functions? Use cheat sheets.
You can search for specific R stuff on https://rseek.org/. You should also read documentations carefully. If you're using a package, search for vignettes and a GitHub repository.
You can also use Stack Overflow. The most common use of Stack Overflow is when you have an error or a question, you Google it, and most of the times the first links are Q/A on Stack Overflow.
You can ask questions on Stack Overflow (using the tag r
). You need to make a great R reproducible example if you want your question to be answered. Most of the times, while making this reproducible example, you will find the answer to your problem.
Join the R-help mailing list. Sign up to get the daily digest and scan it for questions that interest you.
With over 10,000 packages on CRAN it is hard to keep up with the constantly changing landscape. R-Bloggers is an R focused blog aggregation site with dozens of posts per day. Check it out.
file
and scroll through the various functions which appear in the pop-up.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.