In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

``` {r, echo = FALSE} knitr::opts_chunk$set(eval = FALSE)

***

In this guide you will see how to integrate two datasets. The *prerequisity* here is:

- A project initialized within the quick start guide (`vignette("scdrake")`) that should live in the
  `~/scdrake_projects/pbmc1k` directory.
- You have successfully run the pipeline for the `report_norm_clustering` or `report_norm_clustering_simple` target(s).

> For Docker we assume that the container has a shared directory mounted as `/home/rstudio/scdrake_projects`,
as described in `vignette("scdrake_docker")`.

***

The integration pipeline starts with import of `SingleCellExperiment` (SCE) objects from `drake` caches of underlying
single-sample analyses. These objects are the final ones from the `02_norm_clustering` stage, that is,
normalized, with known highly variable genes and clusters, and with computed reduced dimensions.

### Prepare the second sample - PBMC 3k {.tabset}

As a second sample for the integration pipeline we will use another dataset from 10x Genomics - PBMC 3k.
To stick to the project-based approach, we will initialize a new `scdrake` project:

#### In R

```r
init_project("~/scdrake_projects/pbmc3k")

If not done automatically, change your RStudio project or switch the current working directory to the project's root.

On command line

mkdir ~/scdrake_projects/pbmc3k
cd ~/scdrake_projects/pbmc3k
scdrake init-project

On command line (Docker)

mkdir ~/scdrake_projects/pbmc3k
cd ~/scdrake_projects/pbmc3k
docker exec -it -u rstudio -w /home/rstudio/scdrake_projects/pbmc3k <CONTAINER ID or NAME> \
  scdrake init-project

On command line (Singularity)

mkdir -p ~/scdrake_singularity
cd ~/scdrake_singularity
mkdir -p home/${USER} scdrake_projects/pbmc3k
singularity exec -e --no-home \
    --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \
    --pwd "/home/${USER}/scdrake_projects/pbmc3k" \
    path/to/scdrake_image.sif \
    scdrake init-project

{-}

Now we will repeat the steps we have already done for the PBMC 1k sample. In ~/scdrake_projects/pbmc3k:

Open config/single_sample/01_input_qc.yaml and set path inside INPUT_DATA to "../pbmc1k/example_data/pbmc3k" (the example data for PBMC 3k has been already downloaded when you had initialized the project for PBMC 1k dataset).
Open config/pipeline.yaml and set DRAKE_TARGETS to ["sce_final_norm_clustering"].

The config modifications for the second sample are ready, so let's run the pipeline:

{.tabset}

In R

run_single_sample_r()

On command line

scdrake --pipeline-type single_sample run

On command line (Docker)

docker exec -it -u rstudio -w /home/rstudio/scdrake_projects/pbmc3k <CONTAINER ID or NAME> \
  scdrake --pipeline-type single_sample run

On command line (Singularity)

singularity exec -e --no-home \
    --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \
    --pwd "/home/${USER}/scdrake_projects/pbmc3k" \
    path/to/scdrake_image.sif \
    scdrake --pipeline-type single_sample run

{-}

Running the integration pipeline {.tabset}

The configuration file for the integration pipeline is located in config/integration/01_integration.yaml (see vignette("stage_integration")). By default, four integration methods are enabled (you can disable them in the INTEGRATION_METHODS parameter), plus the uncorrected method, which is mandatory as it is used later in the cluster_markers and contrasts stages (uncorrected just performs batch-specific correction for sequencing depth via batchelor::multiBatchNorm()). At least one integration method and uncorrected must be always enabled.

First, as before for the individual samples, we will also initialize a new scdrake project for the integration analysis:

In R

init_project("~/scdrake_projects/pbmc_integration")

On command line

mkdir ~/scdrake_projects/pbmc_integration
cd ~/scdrake_projects/pbmc_integration
scdrake init-project

On command line (Docker)

mkdir ~/scdrake_projects/pbmc_integration
cd ~/scdrake_projects/pbmc_integration
docker exec -it -u rstudio -w /home/rstudio/scdrake_projects/pbmc_integration <CONTAINER ID or NAME> \
  scdrake init-project

On command line (Singularity)

mkdir -p home/${USER} scdrake_projects/pbmc_integration
singularity exec -e --no-home \
    --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \
    --pwd "/home/${USER}/scdrake_projects/pbmc_integration" \
    path/to/scdrake_image.sif \
    scdrake init-project

{-}

Now we modify configs for the integration pipeline:

In ~/scdrake_projects/pbmc_integration:
config/integration/01_integration.yaml: set cache_path to ../pbmc1k/.drake and ../pbmc3k/.drake for pbmc1k and pbmc3k entries, respectively.
config/pipeline.yaml: set DRAKE_TARGETS to ["report_integration"]. To save time, we only run the final target of the 01_integration stage.

And let's run the pipeline.

{.tabset}

In R

run_integration_r()

On command line

scdrake --pipeline-type integration run

On command line (Docker)

docker exec -it -u rstudio -w /home/rstudio/scdrake_projects/pbmc_integration <CONTAINER ID or NAME> \
  scdrake --pipeline-type integration run

{-}

The output is saved in output/integration, as specified by BASE_OUT_DIR in config/integration/00_main.yaml. For 01_integration stage, you can find its final report in output/integration/01_integration/01_integration.html.

You can try to load the target sce_int_dimred_df (a tibble object) containing integrated SingleCellExperiment objects with computed reduced dimensions:

drake::loadd(sce_int_dimred_df)

Post-integration clustering and cell annotation

The post-integration clustering stage (see vignette("stage_int_clustering")) basically replicates the clustering, cell annotation and visualization parts of the 02_norm_clustering stage of the single-sample pipeline. It uses a SingleCellExperiment object from a selected integration method specified in the INTEGRATION_FINAL_METHOD parameter in config/integration/02_int_clustering.yaml.

You can also try to run the post-integration clustering stage by setting DRAKE_TARGETS to ["report_int_clustering"]. By default, the result from the mnn (mutual nearest neighbors) integration method is used.

Cluster markers and contrasts stages

The usage of these stages is the same as in the single-sample pipeline.

bioinfocz/scdrake documentation built on Sept. 19, 2024, 4:43 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bioinfocz/scdrake
A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

On command line

On command line (Docker)

On command line (Singularity)

{-}

{.tabset}

In R

On command line

On command line (Docker)

On command line (Singularity)

{-}

Running the integration pipeline {.tabset}

In R

On command line

On command line (Docker)

On command line (Singularity)

{-}

{.tabset}

In R

On command line

On command line (Docker)

{-}

Post-integration clustering and cell annotation

Cluster markers and contrasts stages

R Package Documentation

Browse R Packages

We want your feedback!

bioinfocz/scdrake A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

In bioinfocz/scdrake: A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language

On command line

On command line (Docker)

On command line (Singularity)

{-}

{.tabset}

In R

On command line

On command line (Docker)

On command line (Singularity)

{-}

Running the integration pipeline {.tabset}

In R

On command line

On command line (Docker)

On command line (Singularity)

{-}

{.tabset}

In R

On command line

On command line (Docker)

{-}

Post-integration clustering and cell annotation

Cluster markers and contrasts stages

R Package Documentation

Browse R Packages

We want your feedback!

bioinfocz/scdrake
A pipeline for droplet-based single-cell RNA-seq data secondary analysis implemented in the drake Make-like toolkit for R language