In this chapter, we will put targets
to the test. We will modify code and data and and watch tar_make()
run the minimum amount of computation required to bring the results back up to date. This saves time when an ambitious project is still under development and rapidly changing.
Throughout this chapter, you will test your understanding with quiz questions. Supply your choices to the answer3_*()
functions. Run the code chunk below to set up the quiz questions. Please try not to peak in advance. It is okay to answer incorrectly at first.
source("R/quiz.R") source("3-changes/answers.R")
Let's clear out the old _targets/
data store.
library(targets) tar_destroy() # Start fresh.
Run the line below to begin with a new _targets.R
file.
tmp <- file.copy("3-changes/initial_targets.R", "_targets.R", overwrite = TRUE)
Then, open _targets.R
in the RStudio IDE.
tar_edit()
What do you see?
A. A call to source("3-changes/functions.R")
to load our custom functions.
B. A call to tar_option_set()
to declare the packages the targets need when they run.
C. The full pipeline from 2-pipelines.Rmd
at the very bottom of the file.
D. All the above.
answer3_setup("E") # Supply your own answer here.
Run whole the pipeline with tar_make()
.
tar_make()
Which target ran first? Why?
A. churn_data
, because the rest of the targets depend on it.
B. churn_file
, because the rest of the targets depend on it.
C. churn_data
, because it appears first in _targets.R
.
D. churn_file
, because it appears first in _targets.R
.
answer3_first("E")
Which target ran last? Why?
A. best_model
, because it depends on the results of the other targets.
B. best_model
, because it is listed last in the pipeline.
C. best_run
, because it depends on the results of the other targets.
D. best_run
, because it is listed last in the pipeline.
answer3_last("E")
Load the best_model
target from the _targets/
data store using the tar_load()
function. Print the object.
tar_load(best_model) print(best_model)
What do you see?
A. A function. B. A trained Keras model. C. A one-row data frame. D. A recipe object.
answer3_inspect("E")
tar_make()
Run tar_make()
again.
tar_make()
What happened? Why?
A. No target reran because the previous results are already in storage (_targets/
).
B. tar_make()
reran the best model.
C. All the targets reran.
D. No target reran because all the targets are already up to date with their dependencies (the underlying code and data, including upstream targets).
answer3_rerun("E")
Restart your R session.
rstudioapi::restartSession()
Restore the session.
source("R/quiz.R") source("3-changes/answers.R")
Then, run tar_make()
again.
library(targets) tar_make()
Which targets reran? Why?
A. None. The targets are in storage (_targets/objects/
) and are still up to date with their dependencies.
B. I get a confusing error message.
C. All targets reran because I restarted the R session.
D. All the models rerun.
answer3_restart("E")
Remove the last row of the dataset file data/churn.csv
.
library(tidyverse) "data/churn.csv" %>% read_csv(col_types = cols()) %>% head(n = nrow(.) - 1) %>% write_csv("data/churn.csv")
Run the pipeline again.
tar_make()
Which targets reran? Why?
A. All of them. targets
notices it has been a while since you ran the pipeline.
B. All of them. The data file changed, and targets
automatically knows data/churn.csv
is a data file.
C. All of them. The data file changed, and the pipeline is configured to automatically track data/churn.csv
. The pipeline tracks the data file because the churn_file
target returns the value "data/churn.csv"
and the call to tar_target()
has format = "file"
.
D. None of them, because data/churn.csv
is neither a target nor a function in the global environment.
answer3_data("E")
Open _targets.R
for editing in the RStudio IDE.
library(targets) tar_edit()
In the command of target best_run
, change the function to rbind()
to the preferred bind_rows()
from dplyr
. The target definition should look like this:
# Do not run here. tar_target( best_run, bind_rows(run_relu, run_sigmoid, run_softmax) %>% top_n(1, accuracy) %>% head(1) )
Then, rerun the pipeline.
tar_make()
Which targets ran? Why?
A. None. bind_rows()
and rbind()
return the same value.
B. None. targets
can predict that bind_rows()
and rbind()
will return the same value.
C. best_run
and best_model
. The command of best_run
changed, and best_model
is directly downstream.
D. best_run
only. The command of best_run
changed, but best_run
returned the same value, so best_model
is still up to date.
answer3_command("E")
Open 3-changes/functions.R
for editing in the RStudio IDE. In the body of define_model()
, change the rate
argument to 0.2 in the first call to layer_dropout()
.
# Do not run here. layer_dropout(rate = 0.2) # previously 0.1
Then, rerun the pipeline.
tar_make()
Which targets ran? Why?
A. None. No target directly calls define_model()
.
B. None. The model definition did not change much.
C. run_relu
, run_sigmoid
, run_softmax
, best_run
, and possibly best_model
. best_model
might not run because it only depends on best_run
.
D. run_relu
, run_sigmoid
, run_softmax
, best_run
, and best_model
.
answer3_function("E")
Open 3-changes/functions.R
for editing in the RStudio IDE. Add a comment in the body of define_model()
and change nothing else.
# Do not run here. define_model <- function(churn_recipe, units1, units2, act1, act2, act3) { # This is an arbitrary comment. input_shape <- ncol( juice(churn_recipe, all_predictors(), composition = "matrix") ) # Leave the rest of the function alone. }
Then, run the pipeline again.
tar_make()
Which targets ran? Why?
A. run_relu
, run_sigmoid
, run_softmax
, best_run
, and best_model
. The define_model()
function changed, which affects all these targets.
B. None. tar_make()
ignores trivial changes like comments and white space: anything that does not show up in deparse(define_model)
.
C. None. tar_make()
ignores all changes to functions.
D. None. No target directly calls define_model()
.
answer3_trivial("E")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.