library(knitr)
library(rmarkdown)
library(xtable)

opts_chunk$set(eval = T, echo = F, message = F, warning = F, results = "asis", 
               fig.cap = "", fig.align = "center", out.width = "95%", out.height = "95%")
options(xtable.comment = FALSE)

Description

It can be a challenge to develop a project in RStudio with multiple authors and make the results repeatable on any machine. It can be especially challenging to collaborate on a project developed in RStudio when output from one author's scripts is used as an input by another author. It can also be frustrating when the scripts writted by one author are run on another author's computer missing the required packages. Not to mention, shared drives may have differing names across computers, which can cause file input and output to break if paths are hardcoded without the using paths relative to the root directory. Finally, it can be a nightmare to setting the correct working directory. All of these challenges are addressed in the projectmap package by attempting to standardize folder structure and file input/output while also performing package version control. The package also allows users to branch files and work on them separately from the master a method for version controlling files.

Installing the Package

To install the projectmap package:

devtools::install_github("opendoor-labs/projectmap")

Setting up a Project

Once the installation is complete, you can now set up a new project using the package. Open RStudio and in the console type:

library(projectmap)
link_to_proj()

You will be prompted to select a directory to build your new project in. It is suggested that you create a new folder. Once the function finishes executing, you will see folders Codes, Documentation, Functions, Input, Libary, Logs, and Output along with two files Example File.R and Project Master.R. Do not change the name or move Project Master.R. The name and location of this file is how projectmap finds the working directory.

Once the package is loaded, projectmap creates a hidden environment variable called proj.env, that maintains a list variables essential for the package functions to operate correctly. You should be able to delete it or edit it without unlocking it first using the unlock_proj() funciton. It is not recommended to make any changes to the proj.env variables, but if you do, make sure to lock it when you are done using the lock_proj() function.

The link_to_proj() function is the workhorse function for the projectmap package. It is responsible for building the folder structure, adding a new library path to your RStudio session, installing all of the required packages into the projects directory, and building a file cabinet of all files in the directory. It also creates a color palette (a vector of hexidecimal strings with names called od.colors) and ggplot theme (od_theme()) based on Opendoor's color palette from the Google Slides template. This function also creates a ".gitignore" file and initializes the root directory as a git repository in case the user would like to use git features with the project. If you want to change this folder to push to a GitHub repository, then in the terminal enter: "git remote add origin https://github.com/folder/package.git" "git config remote.origin.url git@github.com:folder/pacakge.git" replacing "folder/package" with the correct path and package name.

The cabinet maintains a list of file paths relative to the root directory (using the "./" syntax which replaces the folder path of the root directory) for all files in the directory. This list is maintained so authors do not need to hardcode file paths, and instead can use the helper functions get_file_path(), get_file_folder(), save_file(), read_file(), and source_file(), which only require the authors to specify the file name and extension, and optionally, a folder path. This file can be updated by calling the build_cabinet() function, which stores the cabinet in the Functions folder as an RData file.

link_to_proj() (when the option install is set to T, as it is by default) will search through all of the R files in the project directory for library, require, install.packages, and :: keywords, parse the packages in these functions, and install them in the project's library folder. It will also remove any unnecessary packages from the project package library. The default library path is changed, which can be verified by calling .libPaths() and seeing the first item in the list being the path to the projects library folder. This will force RStudio to look in the project's library folder for packages when loading them. This is projectmap's package version control method.

If any of the bigrquery, bigQueryR, googledrive, or googlesheets packages are used in any of the projects R script, it will open up a browser window or tab and prompt the user to authenticate his or her access to Google BigQuery or Google Drive. The OAuth token will be stored in the projects root directory so this step can be avoided when running the scripts in the future. The packages may automatically update stale OAuth tokens, however.

The Project Master File

The project master file is the main file for your project. The first thing you will notice is that the init argument in link_to_proj() is to T. This tells the function to initialize the project.

Next you will see the code chunk

set_proj_models(
  Example = T,
  Model1 = F
)

The names should be set by the author representing submodels of the overall model developed in the project. Each submodel should represent a logical part of the overall model that be run on its own. Once you have partitioned your model into submodels (you can also just run your model as one partition), you need to set it equal to T or F depending on whether you want the R scripts in that submodel to execute the next time you source Project Master.R

The last item in the master file is the code chunk

if(run_proj_model("Example")){
  source_file("Example File.R", inFolder = NULL)
}
if(run_proj_model("Model1")){
  source_file("Model1.R", inFolder = "Codes")
}

The run_proj_model() function returns T or F for the model string argument depending on what was set in the set_proj_models() function above. It is the authors' responsibility to place all R scripts required to develop the model in this part of the master file in the correct order. The scripts should be called using the source_file() function as above. This is a wrapper to the base source() function, that keeps track of the overall progress of the master file's execution. The inFolder argument is optional. If all your R scripts have unique names, you can leave as NULL which is the default. Otherwise, you can specify the folder path to the R script (i.e Codes or Codes/Model 1, if Model1.R lives that folder path).

To see how the master file executes, set the working directory to your projects root folder and type source("Project Master.R") in the console. Be sure to explore the output in the Output folder.

The Example file

This file exists as a template for authors to follow when writing R scripts for the project. The only required function is link_to_proj(), which should be placed at the top after loading the projectmap package. Then, load all other required libraries for the R script using the library() or load.packages() functions. Do not use pacman's p_load() as it may load packages from your default R library rather than the project specific library. Do not use load.packages() in any of the gloabl.R, ui.R, or server.R files if you plan to host the app on shinyapps.io.

rm(list = ls())
#Load projectmap
library(projectmap)
#Link this file as part of the project
link_to_proj()
#Load other required packages
library(ggplot2)
library(data.table)

#OR
#load.packages(ggplot2, data.table)

This will prevent the script from loading packages from your main library and only look in the project library for the packages. You should also place rm(list = ls()) at the very top to make sure your R script exectues in a clean environment.

You will notice a couple examples of how to use the save_file() function for saving data and ggplot objects using od_theme() and the _read_file() function. You should use the save_file() for saving all objects as it is a wrapper function for data.table's fwrite() for saving csv's, ggplot2's ggsave() for saving plots from ggplot objects, xlsx's saveWorkBook() saving xls and xlsx files, and the base save() and saveRDS() for saving RData and rds files.More importantly, it runs some code on the back end to add all saved files to the cabinet so that they can referenced from other projectmap package helper functions.

The save_file() function will automatically select the proper function to use based on the extension given in the file argument. It will also select the default output directory using the get_output_dir() function. This helper function is designed to set up the Output folder to mimic the structure of the Codes folder. For exmaple, if Model1.R exists in Codes/Model 1 then get_output_dir() will replace Codes with Output while keeping the Model1 subfolder. You can also specify an extra subfolder if you would like to separater data from images output, for example, by adding a subfolder to the file name like Images/plot.png.

The Opendoor color names are "blue", "navy", "iris", "turquoise", "citrine", "ruby", "lightgrey", "bluegrey", "coolgrey", "warmgrey", and "lightgreytint". The od_theme() is built with multiple palettes that can be selected by setting the palette argument to one of "main", "cool", or "warm". The main color palette consists of all non-grey colors, the cool color palette consists of "navy", "blue", "bluegrey", and "turquoise". The warm color palette consists of "ruby", "iris", and "citrine".

You can also select a subset of the Opendoor colors or naming your own colors by setting the color argument to a vector of color names or hexidecimal strings. Or, you can select a set number of colors by setting the n argument to any positive integer. If the color palette needs to continuous, make sure to set the discrete argument to F (it is T by default). You can also reverse the order of the color palette by setting the reverse argument to T. If you'd like to add black to the color palette (which is absent by default), you can do so by setting the addblack argument to T.

Finally, the read_file() function is a wrapper function for data.table's fread() for reading in csv files, xlsx's read.xlsx() for reading in xls and xlsx files, and the base load() and readRDS() files for reading in RData and rds files. You only need to specify the file name in the file argument with an extension if you know the file name is unique in the project directory. Otherwise, you can specify the folder path using the inFolder argument (i.e. Output/Model 1/Images). If you leave inFolder as NULL, read_file() will default to looking in the folder path decided by get_output_dir() for the file if multiple files exist with the same name but different folder paths.

Branching and Merging

To allow multiple users to edit the same R file and perform version control, the projectmap package provides will set up the project root folder to work with git. The first step the user should take is to start an R project in this folder by opening up RStudio -> File -> New Project -> Existing Directory and select the appropriate directory. You should see a "Git" tab next to the "Environment", "History", "Connections", and "Build", tabs.

To clone the directory to have a separate copy to work on and prevent unwanted changes to the master copy use:

library(projectmap)
git_clone(repo = "/Users/username/repo", directory = "/Users/username/Documents/repo")

git_clone() is a shorthad for git terminal commands. The projectmaps packages initializes your repo with a .git folder and .gitignore file, and it also updates the git config file to update the master when receiving pushes from cloned directories. You should be able to use RStudios built in git interface when you open your created R project. The projectmap packages also includes some other git shorthand functions: git_branch(), git_diff(), git_pull(), git_merge(), and git_push() that can help with git functionality. Be sure to pull the master copy into your cloned copy before pushing back to the master copy.



opendoor-labs/projectmap documentation built on Oct. 8, 2019, 1:58 p.m.