knitr::opts_chunk$set(echo = TRUE)
Except for binary machine code, all computer code is intended to be read by humans.
Within a CIDA project, code should follow CIDA’s standard organizational structure and be consistently tracked in a Git repository on CIDA’s GitHub server. Strive to use best practices for coding conventions.
Files should be organized to fit within the general CIDA file structure. Within the root location of a project should be found the folders:
Within Code/* any code files should be labelled in a usable way and ordered, for example
At the conclusion of a project any files labelled as such should be the last version used; and when used together or in sequence result in the final project results.
Names should follow one of the following naming conventions consistently within a script (with the exception of pre-existing column titles from other databases), and be concise and meaningful. Avoid naming objects the same or similar names to standard functions.
This section describes best practices for R, R Markdown, and general coding. They are not meant to be “enforced”, but if followed, they will make your (and your CIDA collaborators’) lives easier later on.
attach()
, use with()
instead or another alternative. <-
, not =,
for assignment. 1 -> x
)library()
, and to instead use the :: operator when calling functions from their specific libraries, e.g., CIDAtools::pvalr
. +
, -
, =
, <-
) and after all commas (not before).setwd()
only once at the very top of your script. %>%
), for reasons described here, but keep pipes under 10 steps. Break up large sequences with intermediate objects with meaningful names. sessionInfo()
. If you need to reproduce your report in its entirety and with the same versions of packages, you will then know which package and versions you need, and reinstall them CRAN
. Quote from R for Data Science:
One day you will need to quit R, go do something else and return to your analysis the next day. One day you will be working on multiple analyses simultaneously that all use R and you want to keep them separate. One day you will need to bring data from the outside world into R and send numerical results and figures from R back out into the world. To handle these real life situations, you need to make two decisions: 1) What about your analysis is “real”, i.e. what will you save as your lasting record of what happened? 2) Where does your analysis “live”?
Initially, you may consider your analysis to live in your R environment (e.g. the objects listed in the environment pane). However, it’s much easier to recreate this environment from an R script than it is to recreate an R script from the environment! Your analysis therefore lives in your code. So, if you haven’t already, you should instruct RStudio to never preserve your workspace between sessions to foster this attitude and to make your life easier in the long-term.
knitr::include_graphics("figures/coding_guidelines/coding_1.png")
After changing this, you will notice when you restart RStudio that it will not remember the results of the code you ran last time, because remember – your analysis lives in your code.
If you are working within an Rstudio Project, you will not usually have to worry about absolute vs relative file paths (as files will automatically be saved/loaded from the location of the main project (or its subdirectories). However, if you are knitting a R Markdown file in the Reports/*
subdirectory and/or loading files elsewhere, you may consider wrapping file paths inside the here()
function (from the here package) to ensure all file paths work as anticipated. If your data live on an external server, you may need to point to that server when reading in data (this would be an absolute file path) rather than saving in DataRaw/*
. If this is the case, please indicate in the project readme file (or that in DataRaw/*
the information on where the data exists (and a contact email address for the owners/maintainers of this server).
DataRaw/
or DataProcessed/
) should be backed up to the CIDA drive at least weekly; see this committee’s data storage guidelines and CIDAtools::BackupProject()
. .gitignore
at its main level that tells Git software not to track any data related files or file types. See below for a sample .gitignore file that ignores any CSV, XLSX files, as well as any files stored in DataProcessed or DataRaw subdirectories.Sample .gitignore
file, stored at the project-level directory (the top level of your git repository):
knitr::include_graphics("figures/coding_guidelines/coding_2.png")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.