In wlandau/targets-keras: An Example Targets Pipeline for Machine Learning

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

`targets` R package Keras model example

The goal of this workflow is find the Keras model that best predicts customer attrition ("churn") on a subset of the IBM Watson Telco Customer Churn dataset. (See this RStudio Blog post by Matt Dancho for a thorough walkthrough of the use case.) Here fit multiple Keras models to the dataset with different tuning parameters, pick the one with the highest classification test accuracy, and produce a trained model for the best set of tuning parameters we find.

The `targets` pipeline

The targets R package manages the workflow. It automatically skips steps of the pipeline when the results are already up to date, which is critical for machine learning tasks that take a long time to run. It also helps users understand and communicate this work with tools like the interactive dependency graph below.

library(targets)
tar_visnetwork()

How to access

You can try out this example project as long as you have a browser and an internet connection. Click here to navigate your browser to an RStudio Cloud instance. Alternatively, you can clone or download this code repository and install the R packages listed here.

How to run

In the R console, call the tar_make() function to run the pipeline. Then, call tar_read(hist) to retrieve the histogram. Experiment with other functions such as tar_visnetwork() to learn how they work.

File structure

The files in this example are organized as follows.

├── run.sh
├── run.R
├── _targets.R
├── sge.tmpl
├── R/
├──── functions.R
├── data/
├──── customer_churn.csv
└── report.Rmd

File | Purpose ---|--- run.sh | Shell script to run run.R in a persistent background process. Works on Unix-like systems. Helpful for long computations on servers. run.R | R script to run tar_make() or tar_make_clustermq() (uncomment the function of your choice.) _targets.R | The special R script that declares the targets pipeline. See tar_script() for details. sge.tmpl | A clustermq template file to deploy targets in parallel to a Sun Grid Engine cluster. R/functions.R | An R script with user-defined functions. Unlike _targets.R, there is nothing special about the name or location of this script. In fact, for larger projects, it is good practice to partition functions into multiple files. data/customer_churn.csv | A subset of the IBM Watson Telco Customer Churn dataset report.Rmd | An R Markdown report summarizing the results of the analysis. For more information on how to include R Markdown reports as reproducible components of the pipeline, see the tar_render() function from the tarchetypes package and the literate programming chapter of the manual.

High-performance computing

You can run this project locally on your laptop or remotely on a cluster. You have several choices, and they each require modifications to run.R and _targets.R.

Mode | When to use | Instructions for run.R | Instructions for _targets.R ---|---|---|--- Sequential | Low-spec local machine or Windows. | Uncomment tar_make() | No action required. Local multicore | Local machine with a Unix-like OS. | Uncomment tar_make_clustermq() | Uncomment options(clustermq.scheduler = "multicore") Sun Grid Engine | Sun Grid Engine cluster. | Uncomment tar_make_clustermq() | Uncomment options(clustermq.scheduler = "sge", clustermq.template = "sge.tmpl")

wlandau/targets-keras documentation built on Sept. 26, 2021, 9:20 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com