To implement the customer churn prediction analysis, we will create a targets
pipeline. A pipeline is a high-level collection of steps, or targets, that comprise the workflow. Each target has a command, which is just a piece of R code that returns a value. In practice, targets are usually data cleaning, model fitting, and results summary steps, and the commands are just the R function calls that do those tasks. Please read the instructions in the comments.
library(targets) tar_destroy() # Start fresh.
The commands of your targets will depend on the functions from 1-functions.Rmd
. For the exercises below, they are in 2-pipelines/functions.R
. Take a quick glance at 2-pipelines/functions.R
and re-familiarize yourself with the code base.
The first version of the pipeline consists of three targets, or steps of computation. Our first targets do the following:
The steps above translate to the following R code.
# No need to run this code chunk. library(keras) library(recipes) library(rsample) library(tidyverse) library(yardstick) source("2-pipelines/functions.R") churn_file <- "data/churn.csv" # Sketch of target 1. churn_data <- split_data(churn_file) # Sketch of target 2. churn_recipe <- prepare_recipe(churn_data) # Sketch of target 3.
To formalize our three targets, we express the computation in a special configuration file called _targets.R
at the project's root directory. The role of _targets.R
is to
tar_target()
. The _targets.R
script must always end with a list of target objects.Run the following code chunk to put the _targets.R
file for this chapter in your working directory.
file.copy("2-pipelines/initial_targets.R", "_targets.R", overwrite = TRUE)
Now, open _targets.R
in the RStudio IDE for editing. You will manually make changes to _targets.R
throughout the entire chapter.
library(targets) tar_edit()
Functions tar_make()
, tar_validate()
, tar_manifest()
, tar_glimpse()
, and tar_visnetwork()
all need _targets.R
. _targets.R
lets these functions invoke the pipeline from a new external R process in order to ensure reproducibility.
tar_validate() # Looks for errors.
tar_manifest(fields = "command") # Data frame of target info.
tar_glimpse() # Interactive dependency graph.
tar_glimpse()
is particularly helpful because it shows how the targets depend on one another. As the arrows indicate, target churn_recipe
depends on churn_data
, and target churn_data
depends on churn_file
. The targets
package detects these relationships automatically using static code analysis. churn_data
depends on churn_file
because the command for churn_data
mentions the symbol churn_file
. You can see these dependency relationships for yourself using codetools::findGlobals()
.
library(codetools) findGlobals(function() split_data(churn_file))
That means the order you write your targets does not matter. Even if you rearrange the calls to tar_target()
inside the list, you will still get the same workflow.
tar_visnetwork()
also includes functions in the dependency graph, as well as color-coded status information.
tar_visnetwork()
Everything we did so far was just setup. To actually run the pipeline, use the tar_make()
function. tar_make()
creates a fresh new reproducible R process that runs _targets.R
and executes the correct targets in the correct order (from the dependency graph, not the order in the target list).
tar_make()
Targets churn_data
and churn_recipe
now live in a special _targets/
data store.
list.files("_targets/objects")
churn_file
is an external input file, declared with format = "file"
in tar_target()
, so its value is not in the data store. However, the actual file path, hash, and other metadata are stored in the _targets/meta/meta
spreadsheet, which you can read with tar_meta()
.
library(dplyr) tar_meta(names = starts_with("churn"), fields = path) %>% mutate(path = unlist(path))
Other user-side functions read the actual data objects.
tar_read(churn_data)
tar_read(churn_recipe)
tar_load(churn_file)
churn_file
After inspecting our current targets with tar_load()
and tar_read()
, we are ready to add new targets to fit our models. Informally, the new computations look like this:
run_relu <- test_model(act1 = "relu", churn_data, churn_recipe) # Model 1 run_sigmoid <- test_model(act1 = "sigmoid", churn_data, churn_recipe) # Model 2
Your turn: open _targets.R
for editing.
tar_edit()
Then, express the model commands above as formal targets in the pipeline.
# Should go in _targets.R: library(targets) source("2-pipelines/functions.R") tar_option_set(packages = c("keras", "recipes", "rsample", "tidyverse", "yardstick")) list( tar_target(churn_file, "data/churn.csv", format = "file"), tar_target(churn_data, split_data(churn_file)), tar_target(churn_recipe, prepare_recipe(churn_data)), tar_target("???", "???"), # Your turn: add model 1. tar_target("???", "???") # Your turn: add model 2. )
Visualize the graph to check the pipeline. run_relu
and run_sigmoid
should be new targets that depend on churn_data
, churn_recipe
, test_model()
, and the custom functions called from test_model()
.
tar_visnetwork()
churn_file
, churn_data
and churn_recipe
are up to date from last time, and run_relu
and run_sigmoid
are new and thus outdated.
tar_outdated()
tar_make()
automatically runs the new or outdated targets (in this case, the models) and skips the targets that are already up to date.
tar_make() # Ignore messages like "TensorFlow binary was not compiled to use..."
Those models took a long time to run, right? That is why tar_make()
skips them if they are up to date.
tar_make()
If you are not sure what will run, just call tar_visnetwork()
or tar_outdated()
first.
tar_outdated()
run_relu
and run_sigmoid
should each be a data frame with the accuracy and hyperparameters.
tar_read(run_relu)
tar_read(run_sigmoid)
Your turn: open _targets.R
for editing and type in a third model with a different activation function. Use the act1 = "softmax"
this time.
tar_edit()
# Should go in _targets.R: library(targets) source("2-pipelines/functions.R") tar_option_set(packages = c("keras", "recipes", "rsample", "tidyverse", "yardstick")) list( tar_target(churn_file, "data/churn.csv", format = "file"), tar_target(churn_data, split_data(churn_file)), tar_target(churn_recipe, prepare_recipe(churn_data)), tar_target(run_relu, test_model(act1 = "relu", churn_data, churn_recipe)), tar_target(run_sigmoid, test_model(act1 = "sigmoid", churn_data, churn_recipe)), tar_target(run_softmax, "???") # Your turn: add a model target with `act1 = "softmax"`. )
Now, only run_softmax
should be outdated.
tar_outdated()
tar_visnetwork()
Run the softmax model. tar_make()
should skip everything else.
tar_make()
Inspect the result.
tar_read(run_softmax)
The following R code chooses the model run with the highest accuracy.
# Do not run here. bind_rows(run_relu, run_sigmoid, run_softmax) %>% top_n(1, accuracy) %>% head(1)
Your turn: open _targets.R
and type the above into a new target. Note: although commands should usually be concise function calls, but this is not a restrict requirement, as you will see below.
# Should go in _targets.R: library(targets) source("2-pipelines/functions.R") tar_option_set(packages = c("keras", "recipes", "rsample", "tidyverse", "yardstick")) list( tar_target(churn_file, "data/churn.csv", format = "file"), tar_target(churn_data, split_data(churn_file)), tar_target(churn_recipe, prepare_recipe(churn_data)), tar_target(run_relu, test_model(act1 = "relu", churn_data, churn_recipe)), tar_target(run_sigmoid, test_model(act1 = "sigmoid", churn_data, churn_recipe)), tar_target(run_softmax, test_model(act1 = "softmax", churn_data, churn_recipe)), tar_target( best_run, "???" # Your turn: insert code to choose the model run with the highest accuracy. ) )
best_run
should depend on run_relu
, run_sigmoid
, and run_softmax
.
tar_visnetwork()
The next tar_make()
should run quickly and just build best_run
.
tar_make()
best_run
should contain one row with the accuracy and hyperparameters of the best model.
tar_read(best_run)
Finally, open _targets.R
for editing and add a new target to retrain the best model with retrain_run(best_run, churn_recipe)
. This time, we are returning a Keras model object, so we need to write format = "keras"
in tar_target()
.
# Should go in _targets.R: library(targets) source("2-pipelines/functions.R") tar_option_set(packages = c("keras", "recipes", "rsample", "tidyverse", "yardstick")) list( tar_target(churn_file, "data/churn.csv", format = "file"), tar_target(churn_data, split_data(churn_file)), tar_target(churn_recipe, prepare_recipe(churn_data)), tar_target(run_relu, test_model(act1 = "relu", churn_data, churn_recipe)), tar_target(run_sigmoid, test_model(act1 = "sigmoid", churn_data, churn_recipe)), tar_target(run_softmax, test_model(act1 = "softmax", churn_data, churn_recipe)), tar_target( best_run, bind_rows(run_relu, run_sigmoid, run_softmax) %>% top_n(1, accuracy) %>% head(1) ), tar_target( best_model, "???", # Your turn: call retrain_run() to retrain the best model. format = "keras" # Needed to return the actual Keras model object. ) )
Check the dependency relationships.
tar_visnetwork()
Train the model while skipping previous up-to-date targets.
tar_make()
Inspect the model object.
tar_read(best_model)
The full pipeline in _targets.R
should look like this.
# Should be in _targets.R: library(targets) source("2-pipelines/functions.R") tar_option_set(packages = c("keras", "recipes", "rsample", "tidyverse", "yardstick")) list( tar_target(churn_file, "data/churn.csv", format = "file"), tar_target(churn_data, split_data(churn_file)), tar_target(churn_recipe, prepare_recipe(churn_data)), tar_target(run_relu, test_model(act1 = "relu", churn_data, churn_recipe)), tar_target(run_sigmoid, test_model(act1 = "sigmoid", churn_data, churn_recipe)), tar_target(run_softmax, test_model(act1 = "softmax", churn_data, churn_recipe)), tar_target( best_run, bind_rows(run_relu, run_sigmoid, run_softmax) %>% top_n(1, accuracy) %>% head(1) ), tar_target( best_model, retrain_run(best_run, churn_recipe), format = "keras" ) )
Building up a pipeline is an gradual, iterative process:
tar_outdated()
and tar_visnetwork()
.tar_make()
to run the new targets.tar_load()
and/or tar_read()
.In the next notebook, we will explore what happens when we make changes to the code and data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.