In compbiomed/singleCellTK: Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

knitr::opts_chunk$set(echo = TRUE, eval = TRUE, message=FALSE, warning=FALSE)

````{=html}

## Introduction

Single Cell Toolkit (singleCellTK, SCTK) is a package that works on single-cell RNA-seq (scRNAseq) dataset. SCTK allows users to import multiple datasets, perform quality control and a series of preprocessing, get clustering on cells and markers of clusters, and run various downstream analysis. Meanwhile, SCTK also wraps curated workflows for [celda](https://www.camplab.net/celda/) and [Seurat](https://satijalab.org/seurat/). 

The SCTK tutorial series start from importing, QC and filtering. After this, users can go through either the [A La Carte Workflow](02_a_la_carte_workflow.html) or the Curated Workflows ([Seurat](seurat_curated_workflow.html) or [Celda](celda_curated_workflow.html)). These tutorials will take the real-world scRNAseq dataset as an example, which consists of 2,700 Peripheral Blood Mononuclear Cells (PBMCs) collected from a healthy donor, namingly PBMC3K. This dataset is available from 10X Genomics and can be found on the 10X website.

Before starting the analysis from either UI or console, users should have SCTK properly installed first. Please refer to the [Installation](installation.html) documentation for detail of installing the `singleCellTK` package and relevant dependencies. Alternatively, users can also access the functionality by visiting our <a href="https://sctk.bu.edu/" target="_blank">online web server deployment https://sctk.bu.edu/</a>, or using a stand-alone [docker image](https://hub.docker.com/r/campbio/sctk_shiny). 

## Importing Data

SCTK allows importing dataset from various type of sources, such as preprocessing tools like cellranger or flat text files. SCTK also allows importing multiple batches (samples) of dataset at the same time. Please refer to the [detailed documentation of Importing](import_data.html) for different ways of importing. This tutorial only shows how we can import the example PBMC3K data.

````{=html}
<div class="tabset" id="tabset1">
<div class="tab">
  <button class="tablinks ia" onclick="openTab(event, 'ia', 'tabset1')">Interactive Analysis</button>
  <button class="tablinks console" onclick="openTab(event, 'console', 'tabset1')">Console Analysis</button>
</div>

<div class="tabcontent ia">

After starting the UI, the landing page will be for importing the scRNAseq data. To import the PBMC3K dataset previously mentioned, users should follow these steps:

import \

If you are not already at the starting import page, you start by going to the Data tab and selecting the Import Single Cell Data option (2).

import \

Select the relevant way to import data. For the purpose of this example, we use pbmc3k filtered raw data using the CellRanger option. This data can be downloaded from here.

import \

In the appropriate input boxes, we select the matrix, feature and barcode files and upload each sample individually by giving a unique sample name (5).

import \

Now either import samples or Add one more sample. You can keep adding more samples until all samples have been added and then finally import all of them together.

import \

After successfully importing, the third collapse box will pop up and show users the basic summary stats of the imported dataset. Meanwhile, users can set feature display options here. Most of the time, the dataset has default feature ID (usually seen as row names of a matrix) together with other types of ID (e.g. symbol) in the feature metadata. The first option "Set feature ID" sets the type of default feature ID, which should be unique and has no NA value. The second option "Set feature names to be displayed in downstream visualization" does what it says. When we need to show features on a plot, such as a heatmap or a volcano plot, usually it is better to show gene symbols rather than ensembl IDs.

````{=html}

Here we use `importExampleData()` to load PBMC3k data from the Bioconductor package [TENxPBMCData](https://bioconductor.org/packages/release/data/experiment/html/TENxPBMCData.html).  

```{R import, results='hold'}
library(singleCellTK)
sce <- importExampleData("pbmc3k")
```

An [`SingleCellExperiment`](https://rdrr.io/bioc/SingleCellExperiment/man/SingleCellExperiment.html) (SCE) object is returned as the data container. An SCE object is designed for storing and manipulating expression matrices, gene/cell metadata, low-dimension representation and unstructured information. All the methods wrapped by SCTK will be performed on the SCE object. 

**Importing CellRanger Output Data**

Here, we briefly introduce the approach to importing the output of the widely used preprocessing tool, `cellranger`. SCTK has a generic function `importCellRanger()` for this purpose, and, explicitly, `importCellRangerV2()` and `importCellRangerV3()` for different versions of `cellranger`. For the detail of these functions, please click on the function names to be redirected to the reference page. 

The input arguments basically asks users what the exact paths of the input data files are (i.e. `"matrix.mtx"`, `"features.tsv"`, and `"barcodes.tsv"`). They are `cellRangerDirs`, `sampleDirs`, `cellRangerOuts`, `matrixFileNames`, `featuresFileNames` and `barcodesFileNames`. And the function will identify the specified path, for example, of the barcode file, as a combination of: `{cellRangerDirs}/{sampleDirs}/{cellRangerOuts}/{barcodesFileNames}`. Theses functions automatically try to recognize a preset substucture of `cellRangerDirs`, so that in most of the cases, users only need to specify `cellRangerDirs` to tell where the top level directory of the output is. However, sometimes the three essential files may be placed or named in a different way and the default detection method won't find them. In this case, users will need to check the exact paths and manually specify the correct input according to the combination rule above and the error messages. 

An example folder structure:

```
./datasets/
    sample1/
        outs/filtered_feature_bc_matrix/
            barcodes.tsv.gz
            features.tsv.gz
            matrix.mtx.gz
    sample2/
        outs/filtered_feature_bc_matrix/
            barcodes.tsv.gz
            features.tsv.gz
            matrix.mtx.gz
./otherCellRangerData/
    barcodes.tsv
    genes.tsv
    matrix.mtx
```

```{R, eval=FALSE}
# Default use case
sce <- importCellRanger(cellRangerDirs = "dataset")
# In case the three files are placed in a different way
sce <- importCellRanger(sampleDirs = "otherCellRangerData",
                        cellRangerOuts = "",
                        barcodesFileNames = "barcodes.tsv",
                        featuresFileNames = "genes.tsv",
                        matrixFileNames = "matrix.mtx")
```

````{=html}
</div>
</div>

Quality Control

Quality control of cells is often needed before downstream analyses such as dimension reduction and clustering. Typical filtering procedures include exclusion of poor quality cells with low numbers of counts/UMIs, estimation and removal of ambient RNA, and identification of potential doublet/multiplets. Many tools and packages are available to perform these operations and users are free to apply their tool(s) of choice with SCTK.

Below is a quick example of how to perform standard QC before heading to the downstream analyses. If your data is already QC'ed or you decide to skip this step, you can directly move to the workflows (A La Carte, Seurat or Celda). For this tutorial, we will only run one doublet detection algorithm (scDblFinder) and one decontamination algorithm (decontX). We will also quantify the percentage of mitochondrial genes in each cell as this is often used as a measure of cell viability.

```{=html}  <div class="tabset" id="tabset2"> <div class="tab">

wzxhzdk:3 **Running QC methods** To perform QC, we suggest using the `runCellQC()` function. This is a wrapper for several methods for calculation of QC metrics, doublet detection, and estimation of ambient RNA. For a full list of algorithms that this function runs by default, see `?runCellQC`. ```{R qc, results='hold'} # Run QC sce <- runCellQC(sce, sample = NULL, algorithms = c("QCMetrics", "scDblFinder", "decontX"), mitoRef = "human", mitoIDType = "symbol", mitoGeneLocation = "rownames", seed = 12345) sce <- runQuickUMAP(sce, reducedDimName = "QC_UMAP", seed = 12345) wzxhzdk:4 ```{R plotScDblFinder, message=FALSE, warning=FALSE, fig.height=7, fig.width=8} plotScDblFinderResults(sce, reducedDimName = "QC_UMAP") wzxhzdk:5 A comprehensive HTML report can be generated to visualize and explore the QC metrics in greater detail: ```{R reportqc, eval=FALSE} reportCellQC(sce) wzxhzdk:6 ````{=html}

compbiomed/singleCellTK
Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

In compbiomed/singleCellTK: Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

Quality Control

R Package Documentation

Browse R Packages

We want your feedback!

compbiomed/singleCellTK Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

In compbiomed/singleCellTK: Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

Quality Control

R Package Documentation

Browse R Packages

We want your feedback!

compbiomed/singleCellTK
Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data