View source: R/tar_files_input.R
tar_files_input | R Documentation |
Dynamic branching over input files or URLs.
tar_files_input()
expects a unevaluated symbol for the name
argument,
whereas
tar_files_input_raw()
expects a character string for name
.
See the examples
for a demo.
tar_files_input(
name,
files,
batches = length(files),
format = c("file", "file_fast", "url", "aws_file"),
repository = targets::tar_option_get("repository"),
iteration = targets::tar_option_get("iteration"),
error = targets::tar_option_get("error"),
memory = targets::tar_option_get("memory"),
garbage_collection = targets::tar_option_get("garbage_collection"),
priority = targets::tar_option_get("priority"),
resources = targets::tar_option_get("resources"),
cue = targets::tar_option_get("cue"),
description = targets::tar_option_get("description")
)
tar_files_input_raw(
name,
files,
batches = length(files),
format = c("file", "file_fast", "url", "aws_file"),
repository = targets::tar_option_get("repository"),
iteration = targets::tar_option_get("iteration"),
error = targets::tar_option_get("error"),
memory = targets::tar_option_get("memory"),
garbage_collection = targets::tar_option_get("garbage_collection"),
priority = targets::tar_option_get("priority"),
resources = targets::tar_option_get("resources"),
cue = targets::tar_option_get("cue"),
description = targets::tar_option_get("description")
)
name |
Name of the target.
|
files |
Nonempty character vector of known existing input files to track for changes. |
batches |
Positive integer of length 1, number of batches to partition the files. The default is one file per batch (maximum number of batches) which is simplest to handle but could cause a lot of overhead and consume a lot of computing resources. Consider reducing the number of batches below the number of files for heavy workloads. |
format |
Character, either |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
iteration |
Character, iteration method. Must be a method
supported by the |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy. Possible values:
For cloud-based dynamic files
(e.g. |
garbage_collection |
Logical: |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
tar_files_input()
is like tar_files()
but more convenient when the files in question already
exist and are known in advance. Whereas tar_files()
always appears outdated (e.g. with tar_outdated()
)
because it always needs to check which files it needs to
branch over, tar_files_input()
will appear up to date
if the files have not changed since last tar_make()
.
In addition, tar_files_input()
automatically groups
input files into batches to reduce overhead and
increase the efficiency of parallel processing.
tar_files_input()
creates a pair of targets, one upstream
and one downstream. The upstream target does some work
and returns some file paths, and the downstream
target is a pattern that applies format = "file"
,
format = "file_fast"
, or format = "url"
.
This is the correct way to dynamically
iterate over file/url targets. It makes sure any downstream patterns
only rerun some of their branches if the files/urls change.
For more information, visit
https://github.com/ropensci/targets/issues/136 and
https://github.com/ropensci/drake/issues/1302.
A list of two targets, one upstream and one downstream.
The upstream one does some work and returns some file paths,
and the downstream target is a pattern that applies format = "file"
or format = "url"
.
See the "Target objects" section for background.
Most tarchetypes
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other Dynamic branching over files:
tar_files()
if (identical(Sys.getenv("TAR_LONG_EXAMPLES"), "true")) {
targets::tar_dir({ # tar_dir() runs code from a temporary directory.
targets::tar_script({
library(tarchetypes)
# Do not use temp files in real projects
# or else your targets will always rerun.
paths <- unlist(replicate(4, tempfile()))
file.create(paths)
list(
tar_files_input(
name = x,
files = paths,
batches = 2
),
tar_files_input_raw(
name = "y",
files = paths,
batches = 2
)
)
})
targets::tar_make()
targets::tar_read(x)
targets::tar_read(x, branches = 1)
})
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.