knitr::opts_chunk$set(echo = TRUE)

scipiper targets can be indicator files with a shared cache, local data files with or without local indicator files, or simply R objects. Indicator files are small text files containing a hash of an object, and which represent the object to scipiper, so that the object itself does not have be present locally. A shared cache is a central location on Google Drive or S3 where files can be automatically uploaded. Combined, these two features allow processing steps to happen by collaborators on other computers, and the products to only be downloaded by you if they are necessary, preventing redundant rebuilds and saving you processing time. This vignette provides some guidelines for when to use these different patterns.

Objectives

There are two main goals to keep in mind when deciding what style of target to use:

These two can sometimes be in tension, but usually it will be clear which is more important for a given case.

Key factors to consider

Shared cache or not?

If local only: object or file?

Target choices

There are four types of targets you can use:

Cached file

Shared cache data/indicator file: The target will be only be rebuilt if its dependencies have changed, and the indicator file has not been updated to trigger a download from the remote cache. For longer-running operations that are worthwhile to cache remotely. See the shared caching vignette for different implementation options.

Local-only options

Local data file with local indicator: The target will be rebuilt if the indicator file is missing or dependencies have changed. This can have two purposes: 1) Speeding up scipiper: For each target, scipiper (really remake behind the scenes) hashes dependencies to see if they have changed. If a dependency is a very large object, hashing can take a non-trivial amount of time. Pointing to an indicator file reduces the amount of times the actual data file is hashed. 2) A one-to-many target: Scipiper requires a target to be a single object. If you have a command that creates many individual files, you can create a single indicator file that signals that these files have been created, and contains the locations and hashes of the corresponding files. You may also want to look into task tables, if you have many targets that are created by repeatedly running commands with small differences.

R object: The target will be rebuilt if it is missing or dependencies have changed. It will be written to disk by remake as a .rds file, but won't be easily accessible from disk compared to explicitly writing a local data file. Use cases for this will vary, but can be good for small data objects that don't need to be shared, such as configuration snippets, or cases where a target takes a relatively long time to build, but creates a massive file that is more cumbersome to upload/download than rebuild.

Local data file only: The target will be rebuilt if the local data file does not exist or depedencies have changed. This is only different from an R object target in that you have control over where the object is written, and its file type.



USGS-R/scipiper documentation built on May 25, 2023, 8:47 a.m.