targets
can reproducibly watch input files, output files, and literate programming documents. This chapter explores how to configure a target to automatically run when a file changes.
Run tar_destroy()
to remove the targets from the previous chapter.
library(targets) tar_destroy()
Load this chapter's quiz questions. Try not to peek in advance.
source("R/quiz.R") source("5-files/answers.R")
Run the following command to write the required _targets.R
script to the your working directory.
file.copy("5-files/initial_targets.R", "_targets.R", overwrite = TRUE)
Open _targets.R
for editing
tar_edit()
As we saw in 3-changes.Rmd
, targets
can watch files like data/churn.csv
for changes. Let's review how this works. First, run the full pipeline from start to finish.
tar_make()
Verify that all targets are now up to date.
tar_make()
Now, remove the last line of data in the file.
library(tidyverse) "data/churn.csv" %>% read_csv(col_types = cols()) %>% head(n = nrow(.) - 1) %>% write_csv("data/churn.csv")
When we call tar_make()
again, all targets should rerun because they all depend on the upstream file target churn_file
.
tar_make()
How do we configure churn_file
and downstream targets to rerun when data/churn.csv
changes?
A. The target's return value needs to be a character vector of file and directory paths. These paths get resolved at runtime, so we do not need to know them before we call tar_make()
.
B. In tar_target()
, set the format
argument equal to "file"
. That way, tar_make()
knows the return value of churn_file
is a bunch of file paths that need to be watched.
C. Targets directly downstream need to mention the symbol churn_file
(as opposed to the literal path "data/churn.csv"
) so tar_make()
can discover the correct dependency relationships among targets. Always check with tar_visnetwork()
to verify that your targets are connected properly in the dependency graph.
D. All the above.
answer4_review("E")
Hint:
tar_read(churn_file)
In targets
, we configure output files the exact same way. The only difference between input and output files is that output files are created when the target runs.
As an example, open _targets.R
and create a new file targets churn_cor
, which saves a CSV file and a plot of correlations (the correlation of each covariate with customer churn in the preprocessed testing data). Functions compute_cor()
and plot_cor()
in 5-files/functions.R
do most of the work.
Open _targets.R
for editing.
tar_edit()
Enter the new target below into the target list in _targets.R
.
tar_target( churn_cor, { cor <- compute_cor(churn_recipe) plot <- plot_cor(cor) write_csv(cor, "cor.csv") ggsave(plot = plot, filename = "cor.png", width = 8, height = 8) # The return value must be a vector of paths to the files we write: paste0("cor.", c("csv", "png")) }, format = "file" # Tells targets to track the return value (path) as a file. )
Run the pipeline. Since all previous targets are up to date, only churn_cor
should run.
tar_make()
What is the return value of the new churn_cor
target?
tar_read(churn_cor)
A. c("cor.csv", "cor.png")
, a vector of paths to the files produced by the target.
B. A ggplot
object with the correlation plot.
C. A data frame of correlations.
D. A function called churn_cor()
.
answer4_return("E")
Take a look at the new output file cor.png
. You should see a plot of each variable's correlation with customer churn. Also glance at cor.csv
, the output dataset with the correlations.
read_csv("cor.csv", col_types = cols())
Now, delete one of the output files and rerun the pipeline.
tmp <- file.remove("cor.png") tar_make()
What happened? Why?
A. All targets reran because we changed a file.
B. No target reran because targets
does not track the deleted file.
C. Because churn_cor
is a correctly configured file target, tar_make()
noticed when cor.png
changed and automatically reran the target in order to repair the file.
D. churn_cor
because targets
always treats character strings as file names.
answer4_delete("E")
Whenever a single target like churn_cor
tracks multiple files or directories, tar_make()
treats all those files as a single unit. The whole target invalidates when one of the files changes, and downstream targets must accept all the files together. When we come to the chapter on branching, we will use the special tar_files()
function from the tarchetypes
package to branch over the available files.
Literate programming is the practice of writing code and explanatory prose in the same source file. This R Markdown document is an example. All this time, we have been using literate programming on top of targets
. But now, we will explore literate programming within a target.
Let's pull an example R Markdown file.
tmp <- file.copy("5-files/results.Rmd", "results.Rmd", overwrite = TRUE)
Open results.Rmd
for editing.
library(usethis) edit_file("results.Rmd", open = TRUE)
Notice the calls to tar_load()
and tar_read()
in active the code chunks. results.Rmd
. On its own, this report leverages the results of previous targets.
# Copy and paste the following directly into the R console. # This code chunk will not render results.Rmd properly # if called inside the R notebook. library(rmarkdown) render("results.Rmd") browseURL("results.html")
We can put results.Rmd
in a target so it automatically re-renders when its dependencies change. Open _targets.R
for editing.
tar_edit()
Write library(tarchetypes)
at the very top, and write tar_render(report_step, "results.Rmd")
as a target in the pipeline.
# Do not run here. library(tarchetypes) # Existing setup code goes here. list( # Existing calls to tar_target() stay here. tar_render(report_step, "results.Rmd") )
tar_render()
analyzes report.Rmd
and constructs a target that depends on the report's tar_read()
/tar_load()
dependencies. To see this for yourself, take a look at the dependency graph.
tar_visnetwork()
Which targets does report_step
depend on? Why?
A. None. No upstream targets are mentioned in the target's command.
B. run_relu
because the report calls tar_read(run_relu)
in an active code chunk.
C. run_sigmoid
because the report calls tar_load(run_sigmoid)
in an active code chunk.
D. run_relu
and run_sigmoid
because the report calls tar_read(run_relu)
and tar_load(run_sigmoid)
in active code chunks.
answer4_deps1("E")
At the bottom of the report.Rmd
, add an active code chunk to print out the best model (tar_read(best_model)
). Then, look at the graph again.
tar_visnetwork()
What changed? Why?
A. report_step
is disconnected from the other nodes in the graph because we edited the report.
B. The graph did not change because we did not run the report yet.
C. report_step
now depends on best_model
in addition to run_relu
and run_sigmoid
. All three are mentioned in active code chunks with tar_load()
and tar_read()
.
D. report_step
is only connected to best_model
now because you just added it as a dependency.
answer4_deps2("E")
Run the whole pipeline. The newly added report_step
target should run.
tar_make()
Verify that all targets are up to date now.
tar_make() # See also tar_outdated()
View results.html
. You should see a print-out of the best model at the bottom.
browseURL("results.html")
Remove the output HTML file.
unlink("results.html")
Then rerun the pipeline.
tar_make()
What happened? Why?
A. All targets reran because the file system change.
B. The report_step
target reran because tar_render()
reproducibly tracks the output files of rmarkdown::render()
and helps tar_make()
respond to changes in results.html
.
C. The report_step
target reran because results.Rmd
changed.
D. No targets reran because output files from R Markdown reports are not reproducibly tracked.
answer4_html("E")
Add some prose anywhere in the body of results.Rmd
.
library(usethis) edit_file("results.Rmd", open = TRUE) # Add comments in the report to explain the results.
Now, rerun the pipeline.
tar_make()
What happened? Why?
A. Only report_step
reran because the R Markdown source file changed and all the other targets stayed up to date.
B. Only report_step
reran because R Markdown reports always rerun.
C. All targets reran because results.Rmd
is a dependency of the whole pipeline.
D. No targets reran because the pipeline does not track changes to the R Markdown source.
answer4_rmd("E")
Remove another line of data/churn.csv
.
"data/churn.csv" %>% read_csv(col_types = cols()) %>% head(n = nrow(.) - 1) %>% write_csv("data/churn.csv")
Now, rerun the pipeline.
tar_make()
Did report_step
rerun? Why?
A. No, because data/churn.csv
is not a dependency of report_step
.
B. No, because the report.Rmd
does not read data/churn.csv
.
C. Yes. The R Markdown report is downstream of a target that depends on data/churn.csv
, and the change in the data file caused a chain reaction that changed the model outputs and thus invalidated report_step
.
D. Yes. If a single target imports a data file, all targets in the pipeline rerun.
answer4_rmd_data("E")
tarchetypes::tar_render()
supports parameterized R Markdown, and parameters can be values of upstream targets in the pipeline. In the following pipeline:
list( tar_target(data, data.frame(x = seq_len(26), y = letters)) tar_render(report, "report.Rmd", params = list(your_param = data)) )
the report
target will run:
rmarkdown::render("report.Rmd", params = list(your_param = your_target))
where report.Rmd
has the following YAML front matter:
--- title: report output_format: html_document params: your_param: "default value" ---
and the following code chunk:
print(params$your_param)
See these examples for a demonstration.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.