title: 'The targets R package: a dynamic Make-like function-oriented pipeline toolkit for reproducibility and high-performance computing' tags: - R - reproducibility - high-performance computing - pipeline - workflow - Make date: "12 January 2021" output: pdf_document authors: - name: William Michael Landau orcid: 0000-0003-1878-3253 email: will.landau@gmail.com affiliation: 1 bibliography: paper.bib affiliations: - name: Eli Lilly and Company index: 1
The targets
R package [@targets] is a pipeline toolkit for computationally intense reproducible research. It reduces the time and effort required to develop a data analysis project and maintain a trustworthy set of results. targets
uses static code analysis to detect dependency relationships among interconnected computational tasks and construct a directed acyclic graph (DAG), which researchers can visualize in order to understand and communicate the structure of a complicated workflow. To run the pipeline at scale, targets
leverages implicit parallel computing and optional cloud storage. In subsequent runs, targets
, skips tasks that are already synchronized with their upstream dependencies, which not only reduces the runtime of rapidly developing workflows, but also provides tangible evidence of reproducibility.
In high-performance computing scenarios, targets
uses its DAG to discern which targets can run concurrently and which targets are still waiting for other upstream targets to finish processing. As soon as a target's dependency requirements are met, the target is deployed to the next available parallel worker. Internally, targets
leverages the clustermq
package [@clustermq] for persistent workers and the future
package [@future] for transient workers. Both clustermq
and future
are powerful and versatile frameworks capable of submitting R workloads not only to multiple cores on a single machine, but also to popular resource managers on shared computing clusters.
targets
is the successor to drake
[@drake], which in turn originated from remake
[@remake], an R package modeled after GNU Make [@Make]. Unlike Make, targets
and drake
and remake
focus on the R language, encourage an idiomatic function-oriented style of programming, and abstract each target as an R object. Relative to remake
and drake
, targets
is friendlier and more efficient, surpassing the permanent architectural limitations of both predecessors. The data storage system of targets
is lighter and more transparent, which helps users diagnose issues, move projects to different file systems, work with multiple contributors, and leverage seamless Metaflow-like cloud storage integration [@metaflow]. In addition, targets
supports stronger user-side guardrails, more introspective dependency graph visualizations, parallel efficient dynamic branching, and an interface more amenable to metaprogramming and third-party extensions.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.