R at DHSC {#r_at_dhsc}

The following are the DHSC sensible defaults for R:

R Version & IDE

The dominant IDE for R is Rstudio, which comes packaged with R. For a new project you should use the latest version of Rstudio available from the software portal.

General

Default to packages from the Tidyverse.These have been carefully designed to work together effectively as part of a modern data analysis workflow. More info can be found here: R for Data Science by Hadley Wickham.

For example:

Packages {#r_default_packages}

Recommended Packages:

Project Workflow {#r_projects}

Always work in a project. See the guide to Using Projects.

Projects functionality is broken in DHSC's packaged version of Rstudio - see the fix here

Packaging Your Code {#r_package}

Packages are the fundamental unit of reproducible R code. Therefore, if possible, build an R Package to share and document your code.

Hadley's book on R Packages is an effective guide on how to produce a package.

The usethis package has lots of useful shortcuts for package builders.

Managing Dependencies {#r_dependencies}

There are two key competing ways of managing dependencies for an R Project:

See also:

Using old versions of packages {#r_checkpoint}

You may come across code which doesn't work because it depends on a different version of a package to the one you have.

Fortunately, Microsoft keep daily snapshots of CRAN and store them on the Microsoft R Application Network.

The checkpoint package from Microsoft lets you use these snapshots to install packages as if it were any day since 2017-07-01.

Simply start your script with:

library(checkpoint)

checkpoint(snapshotDate = "2015-01-15",
           checkpointLocation = getwd()) 

This will download and fetch all the packages as they existed on the given date and install them to a library on your home drive.

Notes:

Error Handling {#r_errorhandling}

Base R includes the try() and tryCatch() functions for handling errors. You can find an example of basic use of these on r-bloggers.

Effective error handling in R requires understanding the conditions system. There is a good chapter on this in Hadley's Advanced R book

If you are iterating over many inputs, it is recommended that you use the safely() family of functions from purrr to create versions which return errors within a list for handling at a later stage.

Unit Testing {#r_tests}

Use the testthat package for performing unit tests. For details see the 'tests' chapter of R Packages.



DataS-DHSC/coding_principles_book documentation built on March 11, 2020, 4:13 a.m.