Session details

r add_video(params$video_id)


  1. To become aware of and learn some "best practices" (or "good enough practices") for project organization.
  2. To use RStudio to create and manage projects with a consistent structure.

At the end of this session you will be able:

Resources for learning and help

For learning:

For help:

Best practices overview[^note]

[^note]: Many of the best practices are taken from the "best practices" articles listed in the "Resources".

The ability to read, understand, modify and write simple pieces of code is an essential skill for modern data analysis. Here we introduce you to some of the best practices one should have while writing their codes:

Project management

Managing your projects in a reproducible fashion doesn't just make your science reproducible, it also makes your life easier! RStudio is here to help us with that by using projects!! RStudio projects make it straightforward to divide your work into multiple contexts, each with their own working directory, workspace, history, and source documents.

It is strongly recommended that you store all the necessary files that will be used/sourced in your code in the same directory. You can then use the respective relative path to access them. This makes the directory and R Project a "product", or "bundle/package". Like a tiny machine, that needs to have all it's component parts in the same place.

Let's create our first project!

Creating your first project

RStudio projects are associated with R working directories. You can create an RStudio project:

There are many ways one could organise a project folder. We can set up a project directory folder using prodigenr, using:


...which will have the following folders and files:

├── R
│   ├──
│   ├── fetch_data.R
│   └── setup.R
├── data
│   └──
├── doc
│   └──
├── .Rbuildignore
├── .gitignore
├── ProjectName.Rproj

This forces a specific, and consistent, folder structure to all your work. Think of this like the "introduction", "methods", "results", and "discussion" sections of your paper. Each project is then like a single manuscript or report, that contains everything relevant to that specific project. There is a lot of powerful in something as simple as a consistent structure.

The README in each folder explains a bit about what should be placed there. But briefly:

  1. Documents are in the doc/ directory.
  2. Data, raw data, and metadata should be in either the data/ directory (or data-raw/ for the very raw data).
  3. All R files and code should be in the R/ directory.
  4. Name all new files to reflect their content or function. Follow the tidyverse style guide for file naming.

And make sure to use version control (Git! See the AUOC Git material for more details).

Exercise: Better file naming

Time: 2 min

Think about these file names. Which file names should you use?

fit models.R
Manuscript version 10.docx
new version of analysis.R

Advantages of this project setup

Projects are used to make life easier. Once a project is opened within RStudio the following actions are taken:

Writing code

Use a syntax style guide

Even though R doesn't care about naming, spacing, and indenting, it really matters how your code looks. Coding is just like writing. Even though you may go through a brainstorming note taking stage of writing, you eventually need to write correctly so others can understand what you are trying to say. In coding, brainstorming is fine, but eventually you need to code in a readable way.

Exercise: Make code more readable

Time: 6 min

Before we go more into this section, try to make these code more readable. Edit the code so it's easier to understand what is going on.

# Variable names
c <- 9
mean <- function(x) sum(x)

# Spacing
x[ ,1]
x[ , 1]
mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE )
function (x) {}
mean(x, na.rm=10)
sqrt(x ^ 2 + y ^ 2)
df $ z
x <- 1 : 10

# Indenting
if (y < 0 && debug)
message("Y is negative")

Automatic styling with styler

You have organised it by hand, however it is also possible to do it automatically. The tidyverse style guide has helped people to follow standards styles and automatically re-style chunks of code using an R package: styler. The styler snippets can be found in the Addins function on the top of your R document.

From styler website.

Now, let's try using styler on the exercise code above.

DRY and describing your code

DRY or "don't repeat yourself" is another way of saying, "make your own functions"! That way you don't need to copy and paste code you've used multiple times. Using functions also can make your code more readable and descriptive, since a function is a bundle of code that does a specific task... and usually the function name should describe what you are doing.

It is very important for your future self, and for any person that will be reading/using your code to be able to understand what the code, function, or R Mardown will generate. So it's crucial to describe what the code does, acknowledge the author (if necessary), and give an example on how to execute it. If your function name is well decriptive, then you don't need to spend much time describing what the code does! In the AUOC session on creating functions for packages, we went into detail about function documentation and creation. Here we will briefly cover the core concepts.


# Code developed by Maria Izabel
# The following function outputs the sum of two numeric variables (a and b). 
# usage: summing(a = 2, b = 3)
summing <- function(a, b) {
    return(a + b)

summing(a = 2, b = 3)

The example above is summing up two different numeric variables. Note that the name for this function was chosen as summing, instead of sum. This is because we know that R already has a built-in function called sum and so we don't want to overwrite it!

Loading packages

At the top of each script, you should put all your library calls for loading your packages. Better yet, put all the library calls in a new file and source() that file in each R script.

Workflow and script management with drake

We'll cover this more during the session, but mainly at the end.

au-oc/content documentation built on May 21, 2019, 4:05 a.m.