knitr::opts_chunk$set(echo = TRUE)
scipiper
targets can be indicator files with a shared cache, local data files with or without local indicator files, or simply R objects. Indicator files are small text files containing a hash of an object, and which represent the object to scipiper
, so that the object itself does not have be present locally. A shared cache is a central location on Google Drive or S3 where files can be automatically uploaded. Combined, these two features allow processing steps to happen by collaborators on other computers, and the products to only be downloaded by you if they are necessary, preventing redundant rebuilds and saving you processing time. This vignette provides some guidelines for when to use these different patterns.
There are two main goals to keep in mind when deciding what style of target to use:
These two can sometimes be in tension, but usually it will be clear which is more important for a given case.
Time to create an object vs transfer time: This is the biggest thing to consider --- is it faster to upload a file to the shared cache and have others download it, or just have others rebuild it themselves? This will somewhat depend on your internet connection. Don't forget about upload time as well. Uploading is generally slower than downloading, especially for remote workers with residential internet connections that throttle uploads.
Collaborators: Are all collaborators equipped to rebuild every step in the pipeline, if needed? If a particular step needs to be done by certain people, say a processing-intensive munging step early in the pipeline, than this file will almost certainly need to be in the remote cache.
There are four types of targets you can use:
Shared cache data/indicator file: The target will be only be rebuilt if its dependencies have changed, and the indicator file has not been updated to trigger a download from the remote cache. For longer-running operations that are worthwhile to cache remotely. See the shared caching vignette for different implementation options.
Local data file with local indicator: The target will be rebuilt if the indicator file is missing or dependencies have changed. This can have two purposes: 1) Speeding up scipiper: For each target, scipiper (really remake behind the scenes) hashes dependencies to see if they have changed. If a dependency is a very large object, hashing can take a non-trivial amount of time. Pointing to an indicator file reduces the amount of times the actual data file is hashed. 2) A one-to-many target: Scipiper requires a target to be a single object. If you have a command that creates many individual files, you can create a single indicator file that signals that these files have been created, and contains the locations and hashes of the corresponding files. You may also want to look into task tables, if you have many targets that are created by repeatedly running commands with small differences.
R object: The target will be rebuilt if it is missing or dependencies have changed. It will be written to disk by remake as a .rds
file, but won't be easily accessible from disk compared to explicitly writing a local data file. Use cases for this will vary, but can be good for small data objects that don't need to be shared, such as configuration snippets, or cases where a target takes a relatively long time to build, but creates a massive file that is more cumbersome to upload/download than rebuild.
Local data file only: The target will be rebuilt if the local data file does not exist or depedencies have changed. This is only different from an R object target in that you have control over where the object is written, and its file type.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.