inst/paper/paper.md

title: 'The targets R package: a dynamic Make-like function-oriented pipeline toolkit for reproducibility and high-performance computing' tags: - R - reproducibility - high-performance computing - pipeline - workflow - Make date: "12 January 2021" output: pdf_document authors: - name: William Michael Landau orcid: 0000-0003-1878-3253 email: will.landau@gmail.com affiliation: 1 bibliography: paper.bib affiliations: - name: Eli Lilly and Company index: 1

Summary

The targets R package [@targets] is a pipeline toolkit for computationally intense reproducible research. It reduces the time and effort required to develop a data analysis project and maintain a trustworthy set of results. targets uses static code analysis to detect dependency relationships among interconnected computational tasks and construct a directed acyclic graph (DAG), which researchers can visualize in order to understand and communicate the structure of a complicated workflow. To run the pipeline at scale, targets leverages implicit parallel computing and optional cloud storage. In subsequent runs, targets, skips tasks that are already synchronized with their upstream dependencies, which not only reduces the runtime of rapidly developing workflows, but also provides tangible evidence of reproducibility.

In high-performance computing scenarios, targets uses its DAG to discern which targets can run concurrently and which targets are still waiting for other upstream targets to finish processing. As soon as a target's dependency requirements are met, the target is deployed to the next available parallel worker. Internally, targets leverages the clustermq package [@clustermq] for persistent workers and the future package [@future] for transient workers. Both clustermq and future are powerful and versatile frameworks capable of submitting R workloads not only to multiple cores on a single machine, but also to popular resource managers on shared computing clusters.

targets is the successor to drake [@drake], which in turn originated from remake [@remake], an R package modeled after GNU Make [@Make]. Unlike Make, targets and drake and remake focus on the R language, encourage an idiomatic function-oriented style of programming, and abstract each target as an R object. Relative to remake and drake, targets is friendlier and more efficient, surpassing the permanent architectural limitations of both predecessors. The data storage system of targets is lighter and more transparent, which helps users diagnose issues, move projects to different file systems, work with multiple contributors, and leverage seamless Metaflow-like cloud storage integration [@metaflow]. In addition, targets supports stronger user-side guardrails, more introspective dependency graph visualizations, parallel efficient dynamic branching, and an interface more amenable to metaprogramming and third-party extensions.

References



Try the targets package in your browser

Any scripts or data that you put into this service are public.

targets documentation built on Oct. 12, 2023, 5:07 p.m.