About

In this chapter, we will put targets to the test. We will modify code and data and and watch tar_make() run the minimum amount of computation required to bring the results back up to date. This saves time when an ambitious project is still under development and rapidly changing.

Instructions

Throughout this chapter, you will test your understanding with quiz questions. Supply your choices to the answer3_*() functions. Run the code chunk below to set up the quiz questions. Please try not to peak in advance. It is okay to answer incorrectly at first.

source("R/quiz.R")
source("3-changes/answers.R")

Let's clear out the old _targets/ data store.

library(targets)
tar_destroy() # Start fresh.

Setup

Run the line below to begin with a new _targets.R file.

tmp <- file.copy("3-changes/initial_targets.R", "_targets.R", overwrite = TRUE)

Then, open _targets.R in the RStudio IDE.

tar_edit()

What do you see?

A. A call to source("3-changes/functions.R") to load our custom functions. B. A call to tar_option_set() to declare the packages the targets need when they run. C. The full pipeline from 2-pipelines.Rmd at the very bottom of the file. D. All the above.

answer3_setup("E") # Supply your own answer here.

First target

Run whole the pipeline with tar_make().

tar_make()

Which target ran first? Why?

A. churn_data, because the rest of the targets depend on it. B. churn_file, because the rest of the targets depend on it. C. churn_data, because it appears first in _targets.R. D. churn_file, because it appears first in _targets.R.

answer3_first("E")

Last target

Which target ran last? Why?

A. best_model, because it depends on the results of the other targets. B. best_model, because it is listed last in the pipeline. C. best_run, because it depends on the results of the other targets. D. best_run, because it is listed last in the pipeline.

answer3_last("E")

Inspection

Load the best_model target from the _targets/ data store using the tar_load() function. Print the object.

tar_load(best_model)
print(best_model)

What do you see?

A. A function. B. A trained Keras model. C. A one-row data frame. D. A recipe object.

answer3_inspect("E")

Rerun tar_make()

Run tar_make() again.

tar_make()

What happened? Why?

A. No target reran because the previous results are already in storage (_targets/). B. tar_make() reran the best model. C. All the targets reran. D. No target reran because all the targets are already up to date with their dependencies (the underlying code and data, including upstream targets).

answer3_rerun("E")

Restart the session

Restart your R session.

rstudioapi::restartSession()

Restore the session.

source("R/quiz.R")
source("3-changes/answers.R")

Then, run tar_make() again.

library(targets)
tar_make()

Which targets reran? Why?

A. None. The targets are in storage (_targets/objects/) and are still up to date with their dependencies. B. I get a confusing error message. C. All targets reran because I restarted the R session. D. All the models rerun.

answer3_restart("E")

Change the data.

Remove the last row of the dataset file data/churn.csv.

library(tidyverse)
"data/churn.csv" %>%
  read_csv(col_types = cols()) %>%
  head(n = nrow(.) - 1) %>%
  write_csv("data/churn.csv")

Run the pipeline again.

tar_make()

Which targets reran? Why?

A. All of them. targets notices it has been a while since you ran the pipeline. B. All of them. The data file changed, and targets automatically knows data/churn.csv is a data file. C. All of them. The data file changed, and the pipeline is configured to automatically track data/churn.csv. The pipeline tracks the data file because the churn_file target returns the value "data/churn.csv" and the call to tar_target() has format = "file". D. None of them, because data/churn.csv is neither a target nor a function in the global environment.

answer3_data("E")

Change a command.

Open _targets.R for editing in the RStudio IDE.

library(targets)
tar_edit()

In the command of target best_run, change the function to rbind() to the preferred bind_rows() from dplyr. The target definition should look like this:

# Do not run here.
tar_target(
  best_run,
  bind_rows(run_relu, run_sigmoid, run_softmax) %>%
    top_n(1, accuracy) %>%
    head(1)
)

Then, rerun the pipeline.

tar_make()

Which targets ran? Why?

A. None. bind_rows() and rbind() return the same value. B. None. targets can predict that bind_rows() and rbind() will return the same value. C. best_run and best_model. The command of best_run changed, and best_model is directly downstream. D. best_run only. The command of best_run changed, but best_run returned the same value, so best_model is still up to date.

answer3_command("E")

Change a function.

Open 3-changes/functions.R for editing in the RStudio IDE. In the body of define_model(), change the rate argument to 0.2 in the first call to layer_dropout().

# Do not run here.
layer_dropout(rate = 0.2) # previously 0.1

Then, rerun the pipeline.

tar_make()

Which targets ran? Why?

A. None. No target directly calls define_model(). B. None. The model definition did not change much. C. run_relu, run_sigmoid, run_softmax, best_run, and possibly best_model. best_model might not run because it only depends on best_run. D. run_relu, run_sigmoid, run_softmax, best_run, and best_model.

answer3_function("E")

Make a trivial change.

Open 3-changes/functions.R for editing in the RStudio IDE. Add a comment in the body of define_model() and change nothing else.

# Do not run here.
define_model <- function(churn_recipe, units1, units2, act1, act2, act3) {
  # This is an arbitrary comment.
  input_shape <- ncol(
    juice(churn_recipe, all_predictors(), composition = "matrix")
  )
  # Leave the rest of the function alone.
}

Then, run the pipeline again.

tar_make()

Which targets ran? Why?

A. run_relu, run_sigmoid, run_softmax, best_run, and best_model. The define_model() function changed, which affects all these targets. B. None. tar_make() ignores trivial changes like comments and white space: anything that does not show up in deparse(define_model). C. None. tar_make() ignores all changes to functions. D. None. No target directly calls define_model().

answer3_trivial("E")


kenshuri/targets-tuto documentation built on April 19, 2021, 9:58 a.m.