knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
knitr::opts_template$set(
  file_content = list(
    comment = "",
    collapse = FALSE
  )
)

notestar 📓⭐

R-CMD-check

notestar is a notebook system built on the targets package: notes with targets.

Installation

You can install notestar from GitHub with:

# install.packages("devtools")
devtools::install_github("tjmahr/notestar")

A demo notebook

Here is an example project/notebook showing how notestar combines various .Rmd files into a single HTML file: https://github.com/tjmahr/notestar-demo.

Big picture idea of how notestar works

First, we follow a targets-based workflow. We develop datasets and models and so on, writing functions in R/functions.R and describing a build pipeline in _targets.R. Some familiarity with the big ideas of the targets packages is required.

We then work with our data-analysis products in RMarkdown .Rmd files in a notebook directory. We read in these targets using targets::tar_read(), and we might develop several entries notebook as we tackle different parts of our analysis problem.

In the _targets.R file, there are special notebook-related targets. When we run targets::tar_make(), notestar does the following:

Importantly, notestar only does these jobs when needed. For example, a notebook entry's .md file will only be created if it is outdated. That is,

notestar's role in all of this is to link the data analysis targets to the .Rmd files and then orchestrate the assembly of a notebook from these entries.

Packages that help make this happen

Let's highlight some packages that are indispensable for this scheme to work.

Most important functions for users

Below I show a worked example and describe things in great detail. But before that I want to note that as a user, I only really use 3--4 functions from this package.

A worked example

notestar works best inside of a data analysis project and specifically, as a part of an RStudio project. That is, we have some directory for our project. Everything we do or create will live in that directory, and that directory is the default working directory for all of our R code.

For demonstration, let's create a new directory inside of a temporary directory and make that the home base for our project.

project_dir <- file.path(tempdir(), pattern = "my-project")
dir.create(project_dir)
setwd(project_dir)
knitr::opts_knit$set(root.dir = project_dir)

Nothing here!

fs::dir_tree(all = TRUE)

Initial file skeleton

use_notestar() will populate the project directory with the basic skeleton for the project. We set the theme to "water-dark" so that the screenshots below stick out better from the white background on GitHub.

library(notestar)
use_notestar(cleanrmd_theme = "water-dark")

fs::dir_tree(all = TRUE)

The file config.yml is a config-package configuration file. These configuration options were set when we called use_notestar(), so these are all the default configuration options (except for cleanrmd_theme). Each of these is described by a comment field.

writeLines(readLines("config.yml"))

Two .Rmd files are automatically included: index.Rmd and 0000-00-00.Rmd. These are the first and last entries (top and bottom parts) of the notebook. index.Rmd houses metadata for the notebook:

writeLines(readLines("notebook/index.Rmd"))

The yaml metadata in index.Rmd is created automatically inside the _targets.R file. More on that later.

0000-00-00.Rmd is not meant to be edited. As it tells us, it provides a "References" heading. When the bibliography is appended to the end of the notebook, it will be printed under this heading.

writeLines(readLines("notebook/0000-00-00-references.Rmd"))

The file _targets.R orchestrates the compilation of the notebook using the targets package. targets::tar_make() compiles the notebook by:

I say "if necessary" because targets only builds the targets in workflow if the target has not been built yet or if the target is out of date. Thus, notestar doesn't waste time regenerating earlier entries if they or their dependencies have not changed.

Finally, .here is a sentinel file for the here package. It indicates where the project root is located. R/functions.R is an (as-yet empty) R script that is source()-ed at the start of _targets.R.

Building, rebuilding

Here we build the notebook and see targets build each target.

targets::tar_make()

If we ask it to build the book again, it skips everything---none of the dependencies have changed---but a special spell-checking target set to always run.

targets::tar_make()

Right now, our compiled notebook ("notebook/book/docs/notebook.html") is just the title page:

webshot::webshot(
  "notebook/book/docs/notebook.html",  
  file = "shot1.png",
  vwidth = 400, 
  vheight = 400,
  zoom = 2
)

If we look at the project tree, we see some additions.

fs::dir_tree(all = TRUE)

_targets/ is a new directory. It is the object and metadata storage for targets. We don't worry about it.

There are some md files in notebook/book/ as well as some bookdown-related files (_bookdown.yml, _output.yml and notebook.rds file). There is also the output of bookdown in notebook/book/docs. (notebook/book/docs/notebook.html is the file we screenshotted earlier.)

knitr-helpers.R was also copied to the notebook/book/ directory. This copying reflects design decision by the package. Namely, the contents of the notebook/book directory should not be edited by hand. Its contents should be reproducible whether by regenerating files (like the .md files) or by copying files (like knitr-helpers.R. The user should only have to worry about editing files in the notebook/ directory or in _targets.R (or perhaps config.yml).

We can create a new entry from a template using notebook_create_page() and regenerate the notebook. (A slug is some words we include in the filename to help remember what the entry is about.)

notebook_create_page(date = "2022-02-22", slug = "hello-world")

Now targets has to rebuild the notebook because there is a new entry that needs to be folded in. The network diagram shows that entry_2022_02_hello_world_rmd is outdated (blue) so everything downstream from it is also outdated.

targets::tar_visnetwork(targets_only = TRUE)

When we rebuild the notebook, that entry now appears in the HTML file.

targets::tar_make()
#> [output omitted]

webshot::webshot(
  "notebook/book/docs/notebook.html",
  file = "shot2.png",
  vwidth = 400, 
  vheight = 500, 
  zoom = 2
)

Next steps

From here, we go with the flow. We use targets as we normally would, modifying R/functions.R and targets.R to set up our data-processing pipeline. We can now use our notebook to do reporting and exploration as part of our data-processing pipeline. Things we make with targets can be tar_read() into our notebook entries and tracked as dependencies.

Behind-the-scenes details

In this section, we will describe some behind-the-scenes details about notestar using the worked example.

1. .Rmd/.md files have hidden timestamps

Here is what a minimal Rmd file entry looks like:

writeLines(readLines("notebook/2022-02-22-hello-world.Rmd"))

That first <!--- comment ---> line on top is an HTML comment. It will not be displayed when we view the final html file, but when the .Rmd file is knitted to produce the corresponding .md, the timestamp will be updated. Here is the first line of that .md file:

writeLines(readLines("notebook/book/2022-02-22-hello-world.md")[1])

This timestamp allows us to mark a notebook entry as outdated even if none of the text in the .md file has changed. Here is a motivating example. Let's append a code chunk to the bottom of the notebook entry. It will plot a histogram.

entry_v0 <- readLines("notebook/2022-02-22-hello-world.Rmd")[1:13]
writeLines(
  c(
    entry_v0,
    "Would you look at all these 4's?",
    "```r",
    "hist(faithful$eruptions)",
    "```"
  ),
  "notebook/2022-02-22-hello-world.Rmd"
)

And then we regenerate the notebook.

targets::tar_make()
#> [output omitted]

webshot::webshot(
  "notebook/book/docs/notebook.html",  
  file = "shot3.png",
  vwidth = 400, 
  vheight = 400,
  zoom = 2
)

Let's store the current .md file lines so we can compare it to a later version.

entry_v1 <- readLines("notebook/book/2022-02-22-hello-world.md")

Now, suppose we wanted to change size or resolution of the plot. In this case, we will change the fig.width and fig.height values to 6 here and regenerate the notebook

writeLines(
  c(
    entry_v0,
    "Would you look at all these 4's?",
    "```r",
    "hist(faithful$eruptions)",
    "```"
  ),
  "notebook/2022-02-22-hello-world.Rmd"
)

targets::tar_make()
#> [output omitted]

webshot::webshot(
  "notebook/book/docs/notebook.html",  
  file = "shot4.png",
  vwidth = 400, 
  vheight = 400,
  zoom = 2
)

The figures image files have definitely changed: they are different sizes! The text in the plots in the two screenshots are different sizes. But the text of the .md files is the same---except for the timestamp.

entry_v2 <- readLines("notebook/book/2022-02-22-hello-world.md")
entry_v1 == entry_v2
entry_v1[1] 
entry_v2[1]

This phenomenon, where a change to an .Rmd file would not cause a change in the text of a .md file, is the reason for the timestamp at the top of the .Rmd file.

2. Despite the timestamp, image dependencies are tracked

Our targets graph has a node called entry_2022_02_22_hello_world_md.

targets::tar_visnetwork(
  targets_only = TRUE, 
  names = "entry_2022_02_22_hello_world_md"
)

That node does not represent just the file notebook/book/2022-02-22-hello-world.md. Its plot is also tracked as byproduct of the entry:

targets::tar_read("entry_2022_02_22_hello_world_md")

Thus, if I removed the image, that notebook entry becomes outdated and needs to be reprocessed.

file.remove(targets::tar_read("entry_2022_02_22_hello_world_md")[2])

targets::tar_visnetwork(
  targets_only = TRUE, 
  names = "entry_2022_02_22_hello_world_md"
)

(I forget the problem that motivated me to add this layer of tracking on top of the timestamping, but it's there.)

# undo the deletion before moving on
targets::tar_make()
#> [output omitted]

3. Unused figures are automatically deleted

Think about any other time you've used knitr or RMarkdown. When you remove the code to produce a figure in an .Rmd file, what happens to the plot's image file? Normally, it sticks around, and you eventually find yourself with all kinds of old, no-longer used figures. When knitting a .Rmd file, notestar removes all existing figures associated with an entry beforehand. As a result, only figures that were created during the most recent knitting are retained. This move is what allows image dependencies (see last point) to be inferred: If an image file is created as a result of knitting an .Rmd file, we can associate it with .md file.

Let's demonstrate this feature. Here are the current notebook assets:

fs::dir_tree("./notebook/book/assets")

We will restore the original version of the entry so that the plot is no longer created.

writeLines(entry_v0, "notebook/2022-02-22-hello-world.Rmd")

targets::tar_make()
#> [output omitted]

What we have now is an empty directory.

fs::dir_tree("./notebook/book/assets")

This behavior is controlled in the knitr-helpers.R file, specifically the last line:

writeLines(readLines("notebook/knitr-helpers.R"))
knitr::opts_knit$set(root.dir = NULL)


tjmahr/tarnotes documentation built on Jan. 27, 2025, 2:17 a.m.