file_in: Declare input files and directories. [Stable]
In wlandau-lilly/drake: A Pipeline Toolkit for Reproducible Computation at Scale

file_in

R Documentation

Declare input files and directories.

Description

file_in() marks individual files (and whole directories) that your targets depend on.

Usage

file_in(...)

Arguments

...

Character vector, paths to files and directories. Use .id_chr to refer to the current target by name. .id_chr is not limited to use in file_in() and file_out().

Value

A character vector of declared input file or directory paths.

URLs

As of drake 7.4.0, file_in() and file_out() have support for URLs. If the file name begins with "http://", "https://", or "ftp://", make() attempts to check the ETag to see if the data changed from last time. If no ETag can be found, drake simply uses the ETag from last make() and registers the file as unchanged (which prevents your workflow from breaking if you lose internet access). If your file_in() URLs require authentication, see the curl_handles argument of make() and drake_config() to learn how to supply credentials.

Keywords

drake_plan() understands special keyword functions for your commands. With the exception of target(), each one is a proper function with its own help file.

target(): give the target more than just a command. Using target(), you can apply a transformation (examples: ⁠https://books.ropensci.org/drake/plans.html#large-plans⁠), # nolint supply a trigger (⁠https://books.ropensci.org/drake/triggers.html⁠), # nolint or set any number of custom columns.
file_in(): declare an input file dependency.
file_out(): declare an output file to be produced when the target is built.
knitr_in(): declare a knitr file dependency such as an R Markdown (⁠*.Rmd⁠) or R LaTeX (⁠*.Rnw⁠) file.
ignore(): force drake to entirely ignore a piece of code: do not track it for changes and do not analyze it for dependencies.
no_deps(): tell drake to not track the dependencies of a piece of code. drake still tracks the code itself for changes.
id_chr(): Get the name of the current target.
drake_envir(): get the environment where drake builds targets. Intended for advanced custom memory management.

Examples

## Not run: 
isolate_example("contain side effects", {
# The `file_out()` and `file_in()` functions
# just takes in strings and returns them.
file_out("summaries.txt")
# Their main purpose is to orchestrate your custom files
# in your workflow plan data frame.
plan <- drake_plan(
  out = write.csv(mtcars, file_out("mtcars.csv")),
  contents = read.csv(file_in("mtcars.csv"))
)
plan
# drake knows "\"mtcars.csv\"" is the first target
# and a dependency of `contents`. See for yourself:

make(plan)
file.exists("mtcars.csv")

# You may use `.id_chr` inside `file_out()` and `file_in()`
# to refer  to the current target. This works inside
# static `map()`, `combine()`, `split()`, and `cross()`.

plan <- drake::drake_plan(
  data = target(
    write.csv(data, file_out(paste0(.id_chr, ".csv"))),
    transform = map(data = c(airquality, mtcars))
  )
)
plan

# You can also work with entire directories this way.
# However, in `file_out("your_directory")`, the directory
# becomes an entire unit. Thus, `file_in("your_directory")`
# is more appropriate for subsequent steps than
# `file_in("your_directory/file_inside.txt")`.
plan <- drake_plan(
  out = {
    dir.create(file_out("dir"))
    write.csv(mtcars, "dir/mtcars.csv")
  },
  contents = read.csv(file.path(file_in("dir"), "mtcars.csv"))
)
plan

make(plan)
file.exists("dir/mtcars.csv")

# See the connections that the file relationships create:
if (requireNamespace("visNetwork", quietly = TRUE)) {
  vis_drake_graph(plan)
}
})

## End(Not run)

wlandau-lilly/drake documentation built on Dec. 3, 2024, 11:09 p.m.