knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
targets
R package Keras model exampleThe goal of this workflow is find the Keras model that best predicts customer attrition ("churn") on a subset of the IBM Watson Telco Customer Churn dataset. (See this RStudio Blog post by Matt Dancho for a thorough walkthrough of the use case.) Here fit multiple Keras models to the dataset with different tuning parameters, pick the one with the highest classification test accuracy, and produce a trained model for the best set of tuning parameters we find.
targets
pipelineThe targets
R package manages the workflow. It automatically skips steps of the pipeline when the results are already up to date, which is critical for machine learning tasks that take a long time to run. It also helps users understand and communicate this work with tools like the interactive dependency graph below.
library(targets) tar_visnetwork()
You can try out this example project as long as you have a browser and an internet connection. Click here to navigate your browser to an RStudio Cloud instance. Alternatively, you can clone or download this code repository and install the R packages listed here.
In the R console, call the tar_make()
function to run the pipeline. Then, call tar_read(hist)
to retrieve the histogram. Experiment with other functions such as tar_visnetwork()
to learn how they work.
The files in this example are organized as follows.
├── run.sh ├── run.R ├── _targets.R ├── sge.tmpl ├── R/ ├──── functions.R ├── data/ ├──── customer_churn.csv └── report.Rmd
File | Purpose
---|---
run.sh
| Shell script to run run.R
in a persistent background process. Works on Unix-like systems. Helpful for long computations on servers.
run.R
| R script to run tar_make()
or tar_make_clustermq()
(uncomment the function of your choice.)
_targets.R
| The special R script that declares the targets
pipeline. See tar_script()
for details.
sge.tmpl
| A clustermq
template file to deploy targets in parallel to a Sun Grid Engine cluster.
R/functions.R
| An R script with user-defined functions. Unlike _targets.R
, there is nothing special about the name or location of this script. In fact, for larger projects, it is good practice to partition functions into multiple files.
data/customer_churn.csv
| A subset of the IBM Watson Telco Customer Churn dataset
report.Rmd
| An R Markdown report summarizing the results of the analysis. For more information on how to include R Markdown reports as reproducible components of the pipeline, see the tar_render()
function from the tarchetypes
package and the literate programming chapter of the manual.
You can run this project locally on your laptop or remotely on a cluster. You have several choices, and they each require modifications to run.R
and _targets.R
.
Mode | When to use | Instructions for run.R
| Instructions for _targets.R
---|---|---|---
Sequential | Low-spec local machine or Windows. | Uncomment tar_make()
| No action required.
Local multicore | Local machine with a Unix-like OS. | Uncomment tar_make_clustermq()
| Uncomment options(clustermq.scheduler = "multicore")
Sun Grid Engine | Sun Grid Engine cluster. | Uncomment tar_make_clustermq()
| Uncomment options(clustermq.scheduler = "sge", clustermq.template = "sge.tmpl")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.