Are you planning to migrate the pipeline to {targets}?

We will cite the author of both packages:

{targets} is the successor of {drake}, an older pipeline tool. As of 2021-01-21, {drake} is superseded, which means there are no plans for new features or discretionary enhancements, but basic maintenance and support will continue indefinitely. Existing projects that use {drake} can safely continue to use {drake}, and there is no need to retrofit {targets}. New projects should use {targets} because it is friendlier and more robust.

I have some installation problems

If you are not using the Docker image, the most common cause of installation errors are missing shared libraries.

Feel free to open a new issue.

The pipeline is failing for my data

First make sure you have read the Before you analyse your own data in the Get Started vignette (vignette("scdrake")).

In case you encounter an error like this one:

Error in `get_result(output = out, options)`:
! callr subprocess failed: could not start R, exited with non-zero status, has crashed or was killed
ℹ See `$stdout` and `$stderr` for standard output and error.
Type .Last.error to see the more details.

It means the R process was killed due to insufficient memory or too high CPU usage. The former is more usual and here are few tips for that:

If the problem persists, feel free to open a new issue or start a discussion.

I want to run the pipeline in parallel mode


I want to change the output directory

This can be simply done by changing the appropriate parameters in config files:

I want to use a different cache directory

This is controlled by DRAKE_CACHE_DIR in config/pipeline.yaml.

I want to load intermediate results

All results of the pipeline (targets) are saved into a {drake} cache, and can be simply retrieved using the drake::loadd() or drake::readd() functions:

drake::loadd(name_of_target)
## -- name_of_target can be either quoted (character) or unquoted (symbol)
target <- drake::readd(name_of_target)

To know which targets you can load, please, refer to vignettes of individual pipeline stages.

I want to extend the pipeline

See vignette("scdrake_extend"), please.

I want to manually annotate cells

You can do it using the CELL_GROUPINGS or ADDITIONAL_CELL_DATA_FILE parameter in config/single_sample/02_norm_clustering.yaml and config/integration/02_int_clustering.yaml configs. Please, refer to vignette("stage_norm_clustering") and vignette("stage_int_clustering"), respectively, for description and usage of these parameters.


Alternatively, you can reuse a SingleCellExperiment object, for example:

drake::loadd(sce_final_norm_clustering)
umap <- reducedDim(sce_final_norm_clustering, "umap")
cell_types <- dplyr::case_when(
  umap[, 1] > 1 & umap[, 2] < 5 ~ "cell_type_1",
  umap[, 1] > 5 & umap[, 2] < 10 ~ "cell_type_2",
  TRUE ~ "cell_type_3"
)
sce_final_norm_clustering$my_cell_types <- factor(cell_types)
saveRDS(sce_final_norm_clustering, "sce_my_annotation.Rds")

We have added a new colData() column named my_cell_types to the SCE object that divides cells based on their UMAP coordinates.

Now you need to modify the INPUT_DATA parameter in config/single_sample/01_input_qc.yaml in order to start the pipeline from the saved SCE object instead of cellranger output:

INPUT_DATA:
  type: "sce"
  path: "sce_my_annotation.Rds"

The sce_final_norm_clustering object is already filtered and normalized, so you should skip those procedures:

EMPTY_DROPLETS_ENABLED: False
ENABLE_CELL_FILTERING: False
ENABLE_GENE_FILTERING: False

You can also set to skip the normalization step in config/single_sample/02_norm_clustering.yaml:

NORMALIZATION_TYPE: "none"

The my_cell_types column can be now used for different purposes, e.g. for cluster markers detection, differential expression (stage contrasts) or visualization.

The same can be done for input data in the integration pipeline. Please, refer to the INTEGRATION_SOURCES parameter in vignette("stage_integration").

A target is not getting built

It might happen that a target (especially RMarkdown one) is not getting built although some changes were introduced. However, you can either:

I want to perform subclustering

This can be simply achieved as follows:



bioinfocz/scdrake documentation built on Sept. 19, 2024, 4:43 p.m.