In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

Pipeline config is stored in the config/pipeline.yaml file. Directory with this file is read from SCDRAKE_PIPELINE_CONFIG_DIR environment variable upon {scdrake} load or attach, and saved as scdrake_pipeline_config_dir option. This option is used as the default argument value in several {scdrake} functions.

Pipeline parameters don't have impact on analysis (except SEED, but that should be reproducibly treated by {drake}). Parameters starting with DRAKE_ are generally passed to drake::make() or drake::drake_config().

General parameters

DRAKE_TARGETS: null

Type: character vector or null

Array of target names to make. Setting to null will make all targets. Example for single-sample pipeline / stage 02_norm_clustering reports:

DRAKE_TARGETS: ["report_norm_clustering", "report_norm_clustering_simple"]

DRAKE_CACHE_DIR: ".drake"

Type: character scalar or null

A name of directory to store drake's cache in. If null, the default directory ".drake" will be used.

DRAKE_KEEP_GOING: False

Type: logical scalar

If True, let the pipeline continue even if some target fails.

DRAKE_VERBOSITY: 1

Type: integer scalar (1 | 2 | 3)

Verbosity of {drake}:

0: print nothing.
1: print target-by-target messages as make() progresses.
2: show a progress bar to track how many targets are done so far.

DRAKE_LOCK_ENVIR: True

Type: logical scalar

{drake} locks R global environment to avoid its unwanted modifications by targets. However, in some cases is needed to keep it unlocked.

DRAKE_UNLOCK_CACHE: True

Type: logical scalar

Don't wait for {drake} to discover locked cache after pipeline is run and unlock it immediately.

DRAKE_FORMAT: "rds"

Type: character scalar

A file format used to store intermediate results in DRAKE_CACHE_DIR. See https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets for more details.

By default, R's Rds format is used (see ?saveRds), but we recommend to use DRAKE_FORMAT: "qs" (see https://github.com/traversc/qs) which offers better performance, but sometimes doesn't work correctly (drake throws untraceable errors).

DRAKE_REBUILD: null

Type: character scalar ("all" | "current") or null

Instruct {drake} to rebuild targets although they are considered finished.

For "all", the pipeline is run from scratch (drake::trigger(condition = TRUE) is passed as trigger argument to drake::make() or drake::drake_config()).
For "current", drake::clean() is run for targets specified in DRAKE_TARGETS.

DRAKE_CACHING: "worker"

Type: character scalar ("worker" | "main")

How to collect data from parallel workers. See the caching parameter in drake::drake_config().

DRAKE_MEMORY_STRATEGY: "speed"

How to manage target objects in memory during runtime. See the memory_strategy parameter in drake::drake_config().

You can consider "autoclean", "preclean" or "lookahead" to conserve memory, but at the expense of speed.

DRAKE_LOG_BUILD_TIMES: False

Type: logical scalar

Whether to record build times for targets. Mac users may notice a 20% speedup in drake::make() with DRAKE_LOG_BUILD_TIMES: False.

BLAS_N_THREADS: null

Type: positive integer scalar or null

A maximum number of threads for BLAS operations, passed to RhpcBLASctl::blas_set_num_threads(). Prevents "BLAS : Program is Terminated. Because you tried to allocate too many memory regions" when a massive target parallelism is used. Set to null if you want to keep BLAS defaults.

RSTUDIO_PANDOC: null

Type: character scalar or null

A path to directory with pandoc's binary which is required for rendering of HTML reports.

You can ignore this if:

Scdrake is run in its Docker container.
You are running scdrake from RStudio (it has pandoc bundled).
pandoc is available in the PATH environment variable. You can check this by calling system("pandoc -v").

In {rmarkdown}, the used pandocs binary is then resolved by rmarkdown::find_pandoc().

SEED: 100

Type: integer scalar

An initial seed for random number generator.

Parallelism

DRAKE_PARALLELISM: "loop"

Type: character scalar ("loop" | "future" | "clustermq")

Type of {drake} paralellism.

DRAKE_CLUSTERMQ_SCHEDULER: "multicore"

Type: character scalar

Which scheduler to use if DRAKE_PARALLELISM is "clustermq". See https://mschubert.github.io/clustermq/articles/userguide.html#configuration for possible values.

DRAKE_N_JOBS: 4

Type: positive integer scalar

A number of parallel jobs for drake.

DRAKE_N_JOBS_PREPROCESS: 4

Type: positive integer scalar

A number of parallel jobs for processing the imports and doing other preprocessing tasks.

WITHIN_TARGET_PARALLELISM: False

Type: logical scalar

Allow or disable within-target parallelism through the BiocParallel package. Only possible when DRAKE_PARALLELISM is "loop".

N_CORES: 4

Type: positive integer scalar

A number of cores for within-target parallelism.

Targets

An informative plan is binded with every other plan, and contains targets with useful runtime information. See the Targets section in vignette("config_main").

bioinfocz/scdrake documentation built on Sept. 19, 2024, 4:43 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bioinfocz/scdrake
A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

General parameters

Parallelism

Targets

R Package Documentation

Browse R Packages

We want your feedback!

bioinfocz/scdrake A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

General parameters

Parallelism

Targets

R Package Documentation

Browse R Packages

We want your feedback!

bioinfocz/scdrake
A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language