knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) knitr::opts_template$set( file_content = list( comment = "", collapse = FALSE ) )
notestar is a notebook system built on the targets package: notes with targets.
You can install notestar from GitHub with:
# install.packages("devtools") devtools::install_github("tjmahr/notestar")
Here is an example project/notebook showing how notestar combines various .Rmd files into a single HTML file: https://github.com/tjmahr/notestar-demo.
First, we follow a targets-based
workflow. We develop datasets and models and so on, writing functions in
R/functions.R and describing a build pipeline in _targets.R
. Some
familiarity with the big ideas of the targets packages is required.
We then work with our data-analysis products in RMarkdown .Rmd files in
a notebook directory. We read in these targets using targets::tar_read()
,
and we might develop several entries notebook as we tackle different
parts of our analysis problem.
In the _targets.R
file, there are special notebook-related targets.
When we run targets::tar_make()
, notestar does the following:
For each .Rmd file, it knits (knitr::knit()
) the corresponding .md
output file: computing running the code, printing the results,
saving and inserting figures.
These .md files are collated and assembled into a single-page bookdown document. It looks kind of a data-analysis blog (a page with a sequence of entries in reverse-chronological order).
Importantly, notestar only does these jobs when needed. For example, a notebook entry's .md file will only be created if it is outdated. That is,
data <-
targets::tar_read(data)
) in the .Rmd source file has changed.notestar's role in all of this is to link the data analysis targets to the .Rmd files and then orchestrate the assembly of a notebook from these entries.
Let's highlight some packages that are indispensable for this scheme to work.
Below I show a worked example and describe things in great detail. But before that I want to note that as a user, I only really use 3--4 functions from this package.
use_notestar()
to set up a notestar projectuse_notestar_makefile()
to set up a Makefile that runs
targets::tar_make()
. I then use the RStudio's Build commands to
build projects.use_notestar_references()
to set up a .bib and .csl file
for the notebook.notebook_create_page()
to create a new notebook entrynotebook_browse()
to open the final notebook file in a browser.notestar works best inside of a data analysis project and specifically, as a part of an RStudio project. That is, we have some directory for our project. Everything we do or create will live in that directory, and that directory is the default working directory for all of our R code.
For demonstration, let's create a new directory inside of a temporary directory and make that the home base for our project.
project_dir <- file.path(tempdir(), pattern = "my-project") dir.create(project_dir) setwd(project_dir)
knitr::opts_knit$set(root.dir = project_dir)
Nothing here!
fs::dir_tree(all = TRUE)
use_notestar()
will populate the project directory with the basic
skeleton for the project. We set the theme to "water-dark"
so that the
screenshots below stick out better from the white background on GitHub.
library(notestar) use_notestar(cleanrmd_theme = "water-dark") fs::dir_tree(all = TRUE)
The file config.yml
is a
config-package configuration file.
These configuration options were set when we called use_notestar()
, so
these are all the default configuration options (except for
cleanrmd_theme
). Each of these is described by a comment
field.
writeLines(readLines("config.yml"))
Two .Rmd files are automatically included: index.Rmd
and
0000-00-00.Rmd
. These are the first and last entries (top and bottom
parts) of the notebook. index.Rmd
houses metadata for the
notebook:
writeLines(readLines("notebook/index.Rmd"))
The yaml metadata in index.Rmd
is created automatically inside the
_targets.R
file. More on that later.
0000-00-00.Rmd
is not meant to be edited. As it tells us, it
provides a "References" heading. When the bibliography is appended to
the end of the notebook, it will be printed under this heading.
writeLines(readLines("notebook/0000-00-00-references.Rmd"))
The file _targets.R
orchestrates the compilation of the notebook using
the targets package. targets::tar_make()
compiles the notebook by:
notebook
if necessary to produce a
corresponding .md file notebook/book/
.notebook/book/
into a single-document
bookdown book with bookdown/RMarkdown/pandoc (if necessary).I say "if necessary" because targets only builds the targets in workflow if the target has not been built yet or if the target is out of date. Thus, notestar doesn't waste time regenerating earlier entries if they or their dependencies have not changed.
Finally, .here
is a sentinel file for the
here package. It indicates where the project
root is located. R/functions.R
is an (as-yet empty) R script that
is source()
-ed at the start of _targets.R
.
Here we build the notebook and see targets build each target.
targets::tar_make()
If we ask it to build the book again, it skips everything---none of the dependencies have changed---but a special spell-checking target set to always run.
targets::tar_make()
Right now, our compiled notebook ("notebook/book/docs/notebook.html"
)
is just the title page:
webshot::webshot( "notebook/book/docs/notebook.html", file = "shot1.png", vwidth = 400, vheight = 400, zoom = 2 )
If we look at the project tree, we see some additions.
fs::dir_tree(all = TRUE)
_targets/
is a new directory. It is the object and metadata storage
for targets. We don't worry about it.
There are some md files in notebook/book/
as well as some
bookdown-related files (_bookdown.yml
, _output.yml
and
notebook.rds
file). There is also the output of bookdown in
notebook/book/docs
. (notebook/book/docs/notebook.html
is the file we
screenshotted earlier.)
knitr-helpers.R
was also copied to the notebook/book/
directory. This
copying reflects design decision by the package. Namely, the contents
of the notebook/book
directory should not be edited by hand. Its
contents should be reproducible whether by regenerating files (like the
.md files) or by copying files (like knitr-helpers.R
. The user should
only have to worry about editing files in the notebook/
directory or in
_targets.R
(or perhaps config.yml
).
We can create a new entry from a template using notebook_create_page()
and regenerate the notebook. (A slug is some words we include in the
filename to help remember what the entry is about.)
notebook_create_page(date = "2022-02-22", slug = "hello-world")
Now targets has to rebuild the notebook because there is a new entry
that needs to be folded in. The network diagram shows that
entry_2022_02_hello_world_rmd
is outdated (blue) so everything
downstream from it is also outdated.
targets::tar_visnetwork(targets_only = TRUE)
When we rebuild the notebook, that entry now appears in the HTML file.
targets::tar_make() #> [output omitted] webshot::webshot( "notebook/book/docs/notebook.html", file = "shot2.png", vwidth = 400, vheight = 500, zoom = 2 )
From here, we go with the flow. We use targets as we normally would,
modifying R/functions.R
and targets.R
to set up our data-processing
pipeline. We can now use our notebook to do reporting and exploration as
part of our data-processing pipeline. Things we make with targets can be
tar_read()
into our notebook entries and tracked as dependencies.
In this section, we will describe some behind-the-scenes details about notestar using the worked example.
Here is what a minimal Rmd file entry looks like:
writeLines(readLines("notebook/2022-02-22-hello-world.Rmd"))
That first <!--- comment --->
line on top is an HTML comment. It will
not be displayed when we view the final html file, but when the .Rmd
file is knitted to produce the corresponding .md, the timestamp will be
updated. Here is the first line of that .md file:
writeLines(readLines("notebook/book/2022-02-22-hello-world.md")[1])
This timestamp allows us to mark a notebook entry as outdated even if none of the text in the .md file has changed. Here is a motivating example. Let's append a code chunk to the bottom of the notebook entry. It will plot a histogram.
entry_v0 <- readLines("notebook/2022-02-22-hello-world.Rmd")[1:13] writeLines( c( entry_v0, "Would you look at all these 4's?", "```r", "hist(faithful$eruptions)", "```" ), "notebook/2022-02-22-hello-world.Rmd" )
And then we regenerate the notebook.
targets::tar_make() #> [output omitted] webshot::webshot( "notebook/book/docs/notebook.html", file = "shot3.png", vwidth = 400, vheight = 400, zoom = 2 )
Let's store the current .md file lines so we can compare it to a later version.
entry_v1 <- readLines("notebook/book/2022-02-22-hello-world.md")
Now, suppose we wanted to change size or resolution of the plot. In this case,
we will change the fig.width
and fig.height
values to 6 here and regenerate
the notebook
writeLines( c( entry_v0, "Would you look at all these 4's?", "```r", "hist(faithful$eruptions)", "```" ), "notebook/2022-02-22-hello-world.Rmd" ) targets::tar_make() #> [output omitted] webshot::webshot( "notebook/book/docs/notebook.html", file = "shot4.png", vwidth = 400, vheight = 400, zoom = 2 )
The figures image files have definitely changed: they are different sizes! The text in the plots in the two screenshots are different sizes. But the text of the .md files is the same---except for the timestamp.
entry_v2 <- readLines("notebook/book/2022-02-22-hello-world.md") entry_v1 == entry_v2 entry_v1[1] entry_v2[1]
This phenomenon, where a change to an .Rmd file would not cause a change in the text of a .md file, is the reason for the timestamp at the top of the .Rmd file.
Our targets graph has a node called entry_2022_02_22_hello_world_md
.
targets::tar_visnetwork( targets_only = TRUE, names = "entry_2022_02_22_hello_world_md" )
That node does not represent just the file
notebook/book/2022-02-22-hello-world.md
. Its plot is also tracked as
byproduct of the entry:
targets::tar_read("entry_2022_02_22_hello_world_md")
Thus, if I removed the image, that notebook entry becomes outdated and needs to be reprocessed.
file.remove(targets::tar_read("entry_2022_02_22_hello_world_md")[2]) targets::tar_visnetwork( targets_only = TRUE, names = "entry_2022_02_22_hello_world_md" )
(I forget the problem that motivated me to add this layer of tracking on top of the timestamping, but it's there.)
# undo the deletion before moving on targets::tar_make() #> [output omitted]
Think about any other time you've used knitr or RMarkdown. When you remove the code to produce a figure in an .Rmd file, what happens to the plot's image file? Normally, it sticks around, and you eventually find yourself with all kinds of old, no-longer used figures. When knitting a .Rmd file, notestar removes all existing figures associated with an entry beforehand. As a result, only figures that were created during the most recent knitting are retained. This move is what allows image dependencies (see last point) to be inferred: If an image file is created as a result of knitting an .Rmd file, we can associate it with .md file.
Let's demonstrate this feature. Here are the current notebook assets:
fs::dir_tree("./notebook/book/assets")
We will restore the original version of the entry so that the plot is no longer created.
writeLines(entry_v0, "notebook/2022-02-22-hello-world.Rmd") targets::tar_make() #> [output omitted]
What we have now is an empty directory.
fs::dir_tree("./notebook/book/assets")
This behavior is controlled in the knitr-helpers.R
file, specifically
the last line:
writeLines(readLines("notebook/knitr-helpers.R"))
knitr::opts_knit$set(root.dir = NULL)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.