library("knitr") # The code in this vignette requires a functional Git setup. If a workflowr user # does not already have this setup (which is reasonable since it's not # required), the code is not evaluated and a warning is sent to consult the # online documentation. # If Git user.name and user.email not set, set eval=FALSE. git_config <- workflowr::wflow_git_config() if (is.null(git_config$user.name) || is.null(git_config$user.email)) { opts_chunk$set(eval = FALSE) warning(workflowr:::wrap( "Because you do not have Git configured on this machine, none of the code below was executed. Please refer to the online documentation to see the output: https://jdblischak.github.io/workflowrBeta/")) } rm(git_config) # If ~/.git exists, set eval=FALSE home <- workflowr:::get_home() home_git <- file.path(home, ".git") if (dir.exists(home_git)) { opts_chunk$set(eval = FALSE) warning(workflowr:::wrap( "Because you have a Git repository in your home directory, none of the code below was executed. Please refer to the online documentation to see the output: https://jdblischak.github.io/workflowrBeta/ \n\nYou should consider removing the directory since it was likely created in error: ", home_git)) } rm(home, home_git)
.tmp <- tempfile("wflow-01-getting-started-") .tmp <- workflowr:::absolute(.tmp) .project <- file.path(.tmp, "myproject") dir.create(.project, recursive = TRUE) opts_knit$set(root.dir = .project)
The goal of the workflowr package is to make it easier for researchers to organize their projects and share their results with colleagues. If you are already writing R code to analyze data, and know the basics of Git and GitHub, you can start taking advantage of workflowr immediately. In a matter of minutes, you can create a research website like this.
This tutorial assumes you have already followed the "Quick start" instructions in the README. Specifically, you need to have R, pandoc (or RStudio), and workflowr installed on your computer. Furthermore, you need a GitHub account.
A workflowr project has two key components:
An R Markdown-based website. This consists of a configuration file
(_site.yml
), a collection of R Markdown files, and their
corresponding HTML files.
A Git repository. Git is a version control system that helps track code development^[There are many ways to use Git: in the Terminal, in the RStudio Git pane, or another Git graphical user interface (GUI) (see here for GUI options).]. Workflowr is able to run the basic Git commands, so there is no need to install Git prior to using workflowr.
One of the main goals of workflowr is to help make your research more transparent and reproducible. This is achieved by including the unique identifier that Git assigns a snapshot of your code (or "commit" as Git calls it) at the top of each of your HTML files, so you always know which version of the code produced the results.
To start a new project, open R (or RStudio) and load the workflowr package (note that all the code in this vignette should be run directly in the R console, i.e. do not try to run workflowr functions inside of R Markdown documents).
library("workflowr")
If you have never created a Git repository on your computer before, you need to run the following command to tell Git your name and email. Git uses this information to assign the changes you make to the code to you (analogous to how Track Changes in a Microsoft Office Word document assigns your changes to you). You do not need to use the exact same name and email as you used for your GitHub account. Also, you only need to run this command once per computer, and all subsequent workflowr projects will use this information (you can also update it at any time by re-running the command with different input).
# Replace the example text with your information wflow_git_config(user.name = "Your Name", user.email = "email@domain")
Now you are ready to start your first workflowr project!
wflow_start("myproject")
creates a directory called myproject/
that contains
all the files to get started. It also changes the working directory to
myproject/
^[If you're using RStudio, you can alternatively create a new
workflowr project using the RStudio project template. Go to File
-> New
Project...
and select workflowr project
from the list of project types. In
the future you can return to your project by choosing Open Project...
and
selecting the file myproject.Rproj
. This will set the correct working
directory in the R console, switch the file navigator to the project, and
configure the Git pane.] and initializes a Git repository with the initial
commit already made.
setwd(.tmp) unlink(.project, recursive = TRUE) wflow_start("myproject")
wflow_start()
created the following directory structure in myproject/
:
myproject/ ├── .Rprofile ├── analysis/ │ ├── about.Rmd │ ├── chunks.R │ ├── include/ │ │ └── footer.html │ ├── index.Rmd │ ├── license.Rmd │ ├── README.md │ └── _site.yml ├── CITATION ├── code/ │ ├── README.md │ ├── script.py* │ ├── script.R* │ └── script.sh* ├── data/ │ └── README.md ├── docs/ ├── LICENSE ├── myproject.Rproj ├── output/ │ └── README.md └── README.md
At this point, you have a minimal but complete workflowr project; that is, you have all the files needed to use the main workflowr commands and publish a research website. Later on, as you get more comfortable with the basic setup, you can modify and add to the initial file structure. The overall rationale for this setup is to help organize the files that will be commonly included in a data analysis project. However, not all of these files are required to use workflowr.
The two required subdirectories are analysis/
and docs/
. These
directories should never be removed from the workflowr project.
analysis/
: This directory contains all the source R Markdown files for
implementing the data analyses for your project. It also contains a special R
Markdown file, index.Rmd
, that does not contain any R code, but will be used
to generate index.html
, the homepage for your website. In addition, this
directory contains important configurations files like _site.yml
and
chunks.R
. Do not delete index.Rmd
, _site.yml
, or chunks.R
.
docs/
: This directory contains all the HTML files for your
website. The HTML files are built from the R Markdown files in
analysis/
. Furthermore, any figures created by the R Markdown files
are saved here. Each of these figures is saved according to the
following pattern: docs/figure/<insert Rmd filename>/<insert chunk
name>-#.png
, where #
corresponds to which of the plots the chunk
generated (since one chunk can produce an arbitrary number of plots).
Also required is the RStudio project file, in this example myproject.Rproj
.
Even if you are not using RStudio, do not delete this file because the workflowr
functions rely on it to determine the root directory of the project.
The optional directories are data/
, code/
, and output/
.
These directories are suggestions for organizing your data analysis
project, but can be removed if you do not find them useful.
data/
: This directory is for raw data files.
code/
: This directory is for code and scripts that might not be
appropriate as R Markdown notebooks (e.g. for pre-processing the data, or
for long-running code).
output/
: This directory is for processed data files and other
outputs generated from the code and data. For example, scripts in
code
that pre-process raw data files from data/
should save the
processed data files in output/
.
Other optional files included are CITATION
, LICENSE
, and README.md
, which
are to encourage you to include information on how to cite your work, the
license that determines how others can reuse your work, and usage details,
respectively. The .Rprofile
file is a regular R script that is run once when
the project is opened. It contains the call library("workflowr")
, ensuring
that workflowr is loaded automatically each time a workflowr-project is opened.
You will notice that the docs/
directory is currently empty. That is
because we have not yet generated the website from the analysis/
files. This is what we will do next.
To build the website, run the function wflow_build()
in the R
console:
wflow_build()
# Don't want to actually open the website when building the vignette wflow_build(view = FALSE)
This command builds all the R Markdown files in analysis/
and saves
the corresponding HTML files in docs/
. It sets the same seed before
running every file so that any function that generates random data
(e.g. permutations) is reproducible. Furthermore, each file is built
in its own external R session to avoid any potential conflicts between
analyses (e.g. accidentally sharing a variable with the same name across files).
Lastly, it displays the website in the RStudio Viewer or default web browser.
The default action of wflow_build()
is to behave similar to a
Makefile (make = TRUE
is the default), i.e. it only builds R Markdown files that have been
modified more recently than their corresponding HTML files. Thus if
you run it again, no files are built (and no files are displayed).
wflow_build()
To view the site without first building any files, run wflow_view()
, which by
default displays the file docs/index.html
:
wflow_view()
This is how you can view your site right on your local machine. Go ahead and
edit the files index.Rmd
, about.Rmd
, and license.Rmd
to describe your
project. Then run wflow_build()
to re-build the HTML files and display them in
the RStudio Viewer or your browser.
for (f in file.path("analysis", c("index.Rmd", "about.Rmd", "license.Rmd"))) { cat("\nedit\n", file = f, append = TRUE) }
workflowr makes an important distinction between R Markdown files that are
published versus unpublished. A published file is included in the website
online; whereas, the HTML file of an unpublished R Markdown file is only able to
be viewed on the local computer. Since the project was just started, there are
no published files. To view the status of the workflowr project, run
wflow_status()
.
wflow_status()
This alerts us that our project has 3 R Markdown files, and they are all
unpublished ("Unp"). Furthermore, it instructs how to publish them: use
wflow_publish()
. The first argument to wflow_publish()
is a character vector
of the R Markdown files to publish ^[Instead of listing each file individually,
you can also pass file globs
as input to any workflowr function, e.g. wflow_publish("analysis/*Rmd",
"Publish the initial files for myproject")
]. The second is a message that will
recorded by the version control system Git when it commits (i.e. saves a
snapshot of) these files. The more informative the commit message the better (so
that future you knows what you were trying to accomplish).
wflow_publish(c("analysis/index.Rmd", "analysis/about.Rmd", "analysis/license.Rmd"), "Publish the initial files for myproject")
# Don't want to actually open the website when building the vignette wflow_publish(c("analysis/index.Rmd", "analysis/about.Rmd", "analysis/license.Rmd"), "Publish the initial files for myproject", view = FALSE)
wflow_publish()
reports the 3 steps it took:
Step 1: Commits the 3 R Markdown files using the custom commit message
Step 2: Builds the HTML files using wflow_build()
Step 3: Commits the 3 HTML files plus the files that specify the style of the website (e.g. CSS and Javascript files)
Performing these 3 steps ensures that the HTML files are always in sync with the
latest versions of the R Markdown files. Performing these steps manually would
be tedious and error-prone (e.g. an HTML file may have been built with an
outdated version of an R Markdown file). However, wflow_publish()
makes it
easy to keep the pages of your site in sync.
Now when you run wflow_status()
, it reports that all the files are published
and up-to-date.
wflow_status()
At this point you have built a version-controlled website that exists on your local computer. The next step is to put your code on GitHub so that it can serve your website online. To do this, login to your account on GitHub and create a new repository following these instructions. Make sure you do not add an automatically-generated README, .gitignore, or license (these are important, but workflowr already creates them for you). For the purposes of this tutorial, the code below assumes that the GitHub repository also has the name "myproject." This isn't strictly neccesary (you can name your GitHub repository whatever you like), but it's generally good organizational practice to use the same name for both your GitHub repository and the local directory on your computer.
Next you need to tell your local Git repository about this new GitHub
repository. Run the wflow_remotes()
command below in the R console, replacing
"myname" with your GitHub username:
wflow_remotes("origin", "myname", "myproject")
This creates the alias "origin" that points to your remote repository on
GitHub^["origin" is the conventional name, but could be anything you wanted].
The associated URL is https://github.com/myname/myproject.git^[The name of the
repository on GitHub does not need to be identical to the directory name of your
local Git repo; however, it is convenient to have them match since this is the
default behavior of git clone
when copying your repo to a another computer].
You only need to run this command once to add the remote repository.
Now you can push your files to GitHub with the function wflow_git_push().
Run
the following in the R console:
wflow_git_push(dry_run = TRUE)
Using dry_run = TRUE
previews what the function will do. Remove this argument
to actually push to GitHub. You will be prompted to enter your GitHub username
and password for authentication^[If you'd prefer to use SSH keys for
authentication, please see the section Setup SSH
keys]. Each time you make changes
to your project, e.g. run wflow_publish()
, you will need to run
wflow_git_push()
to send the changes to GitHub.
Now that your code is on GitHub, you need to tell GitHub that you want the files
in docs/
to be published as a website. Go to Settings -> GitHub Pages and
choose "master branch docs/ folder" as the Source
(instructions). Using the hypothetical names above, the
repository would be hosted at the URL https://myname.github.io/myproject/
^[It
may take a few minutes for the site to be rendered.]. If you scroll back down to
the GitHub Pages section of the Settings page, you can click on the URL there.
Now that you have a functioning website, the next step is to start analyzing
data! To start a new analysis called first-analysis.Rmd
, use wflow_open()
:
# Because devtools_shims overrides system.file, wflow_open can't work when # building documentation with devtools::document. Thus create a blank file so # that it doesn't try to copy the template via rmarkdown::draft. file.create("analysis/first-analysis.Rmd")
wflow_open("first-analysis.Rmd")
wflow_open("first-analysis.Rmd", open_file = FALSE) setwd("..")
This performs multiple actions:
analysis/first-analysis.Rmd
based on the workflowr R Markdown template (it doesn't overwrite the file if it already exists)analysis/
directoryNow you are ready to start writing! At the top of the file, edit the author,
title, and date. Where it says "Add your analysis here", add some code chunks to
experiment. If you are using RStudio, press the Knit button to build the file
and see a preview in the Viewer pane. Alternatively from the R console, you can
run wflow_build()
again (this function can be run from the base directory of
your project or any subdirectory).
Check out your new file first-analysis.html
. Near the top you will see a line
that says "Code version:" followed by an alphanumeric character string. This
informs you which version of the code was used to create the file. It also
automatically inserts the date when the HTML was built.
In order to make it easier to navigate to your new file, you can include a link
to it on the main index page. First open analysis/index.Rmd
(you can use your
filesystem navigator or wflow_open("index.Rmd")
). Second paste
the following line into index.Rmd
:
Click on this [link](first-analysis.html) to see my results.
cat("\nClick on this [link](first-analysis.html) to see my results.\n", file = "analysis/index.Rmd", append = TRUE)
This uses the Markdown syntax for creating a hyperlink (for a quick reference
guide in RStudio click "Help" -> "Markdown Quick Reference"). You specify the
HTML version of the file since this is what comprises the website. Click Knit
(or run wflow_build()
again) to check that the link works.
Now run wflow_status()
again. As expected, two files need attention.
index.Rmd
has status "Mod" for modified. This means it is a published file
that has subsequently been modified. first-analysis.Rmd
has status "Scr" for
Scratch. This means not only it the HTML not published, but the R Markdown file
is not yet being tracked by Git.
setwd("analysis") wflow_status() setwd("..")
To publish the new analysis and the updated index page, again use
wflow_publish()
:
# Assuming working directory is `analysis/`. Run getwd() to confirm wflow_publish(c("index.Rmd", "first-analysis.Rmd"), "Add my first analysis")
# Don't want to actually open the website when building the vignette setwd("analysis") # Assuming working directory is `analysis/`. Run getwd() to confirm wflow_publish(c("index.Rmd", "first-analysis.Rmd"), "Add my first analysis", view = FALSE) setwd("..")
Lastly, push the changes to GitHub with wflow_git_push()
to deploy these
latest changes to the website.
This is the general workflow:
Open a new or existing file with wflow_open()
Perform your analysis in the R Markdown file (For RStudio users: to quickly develop the code I recommend executing the code in the R console via Ctrl-Enter to send one line or Ctrl-Alt-C to execute the entire code chunk)
Run wflow_build()
to view the results as they will
appear on the website (alternatively press the Knit button in RStudio)
Go back to step 2 until you are satisfied with the result
Run wflow_publish()
to commit the source files (R Markdown files or other
files in code/
, data/
, and output/
), build the HTML files, and commit the
HTML files
Push the changes to GitHub with wflow_git_push()
This ensures that the "Code version:" inserted into each HTML file corresponds to the state of the Git repository at the time the HTML was built.
The only exception to this workflow is if you are updating the aesthetics of
your website (e.g. anytime you make edits to analysis/_site.yml
). In this case
you'll want to update all the published HTML files, regardless of whether or not
their corresponding R Markdown files have been updated. To republish every HTML
page, run wflow_publish()
with republish = TRUE
. This behavior is only
previewed below by specifying dry_run = TRUE
.
setwd("analysis") # Assuming working directory is `analysis/`. Run getwd() to confirm wflow_publish("_site.yml", republish = TRUE, dry_run = TRUE) setwd("..")
To learn more about workflowr, you can read the following vignettes:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.