2020-01-14: Project-oriented workflow

Project-oriented workflow

--Jenny Bryan

Workflow versus product

Definitions

Don't hardwire your workflow into your product.

Which is workflow or product?

  1. The editor you use to write your R code.

  2. The raw data.

  3. The name of your home directory.

  4. The R code someone needs to run on your raw data to get your results, including the explicit library() calls to load necessary packages.

Example: Remove workflow

The name of the home directory is workflow, not product.

home <- "C:/Users/Mauro/Documents/"  # Workflow
proj_path <- "path/to/project"
paste0(home, proj_path)

Better

proj_path <- "path/to/project"
fs::path_home_r(proj_path)

Best

fs::path_home_r("path", "to", "project")

Self-contained projects

Self-contained projects can be moved around on your computer or onto other computers and will still "just work".

It’s like agreeing that we will all drive on the left or the right. A hallmark of civilization is following conventions that constrain your behavior a little, in the name of public safety.

--Jenny Bryan

What do they look like?

  1. The Project folder contains all relevant files.

  2. Any .R can run from a fresh R process with wd set to root.

  3. Any .R creates all it needs, in its own workspace or folder

  4. Any .R touches nothing it didn't create (e.g. doesn't install).

Violations ...

What should you do instead of this?

path_to_data <- "../datasets/my-data.csv"

What should you do instead?

What should you do instead of this?

pacman::p_load(random)

setwd( )

What's wrong?

library(ggplot2)
setwd("/Users/jenny/cuddly_broccoli/verbose_funicular/foofy/data")
df <- read.delim("raw_foofy_data.csv")
p <- ggplot(df, aes(x, y)) + geom_point()
ggsave("../figs/foofy_scatterplot.png")

What's wrong?

What should you do instead?

library(ggplot2)
library(here)

df <- read.delim(here("data", "raw_foofy_data.csv"))
p <- ggplot(df, aes(x, y)) + geom_point()
ggsave(here("figs", "foofy_scatterplot.png"))

rm(list = ls( ))

What's wrong?

What's better?

Discuss: Must have or nice to have?

The importance of these practices has a lot to do with whether your code will be run by other people, on other machines, and in the future. If your current practices serve your purposes, then go forth and be happy

-- Jenny Bryan

Learn more



2DegreesInvesting/ds-incubator documentation built on Oct. 13, 2021, 10:09 a.m.