library(knitr)
opts_chunk$set(echo = TRUE)

Brief introduction

The current implementation of iSMNN encompasses two major steps: one optional cluster harmonizing step and the other batch effect correction step. In the first step, the clusters/cell type labels are matched/harmonized across multiple scRNA-seq batches using unifiedClusterLabelling function from SMNN package (Yang et al. 2020). This entire clustering step can be by-passed by feeding iSMNN cell cluster labels. With cell cluster label information, iSMNN iteratively searches mutual nearest neighbors within each harmonized cell type, and performs batch effect correction using the iSMNN function.

In this tutorial, we will perform batch effect correction using iSMNN in a toy example containing two batches. The first batch contains 400 cells from three cell types, namely fibroblasts, macrophages and endothelial cells. And the second batches has 500 cells from the same three cell types. Both two batches contain 3000 genes.


Installation

iSMNN package can be directly installed from GitHub with:

install.packages("devtools")
devtools::install_github("yycunc/iSMNN")

Set up the library

library("iSMNN")

Load the input expression matrix

data("data_iSMNN")
dim(data_iSMNN$batch1.mat)
data_iSMNN$batch1.mat[1:5, 1:5]
dim(data_iSMNN$batch2.mat)

The optional step of cluster harmonizing

The function unifiedClusterLabelling from SMNN package is used to match/harmonize the clusters/cell type labels across multiple scRNA-seq batches. It takes as input raw expression matrices from two or more batches, a list of marker genes and their corresponding cluster labels, and outputs harmonized cluster label for every single cells across all batches.

Provide cell-type specific marker gene information

Two pieces of information are needed: - Marker genes - Corresponding cell type labels for each marker gene

# Maker genes
markers <- c("Col1a1", "Pdgfra", "Ptprc", "Pecam1")
# Corresponding cell type labels for each marker gene
cluster.info <- c("fibroblast", "fibroblast", "macrophage", "endothelial cells")

Harmonize cluster labels across batches

library(SMNN)
batch.cluster.labels <- unifiedClusterLabelling(data_SMNN$batch1.mat, data_iSMNN$batch2.mat, features.use = markers, cluster.labels = cluster.info, min.perc = 0.3)
# Add cell ID to the harmonized cluster/cell type labels
names(batch.cluster.labels[[1]]) <- colnames(data_iSMNN$batch1.mat) 
names(batch.cluster.labels[[2]]) <- colnames(data_iSMNN$batch2.mat)

batch.cluster.labels is a list where each element represents the harmonized cluster/cell type labels of each batch.

head(batch.cluster.labels[[1]])
head(batch.cluster.labels[[2]])

Batch effect correction using iSMNN function

With harmonized cluster label information for single cells across batches, we implement batch effect correction using iSMNN.

Construct the input object for batches using Seurat

Input object is first constructed following the instruction of Seurat v3 package. See the tutorial from the Seurat website for details.

library(Seurat)
merge <- CreateSeuratObject(counts = cbind(data_iSMNN$batch1.mat, data_iSMNN$batch2.mat), min.cells = 0, min.features = 0)
batch_id <- c(rep("batch1", ncol(data_iSMNN$batch1.mat)), 
              rep("batch2", ncol(data_iSMNN$batch2.mat)))
names(batch_id) <- colnames(merge)
merge <- AddMetaData(object = merge, metadata = batch_id, col.name = "batch_id")
merge.list <- SplitObject(merge, split.by = "batch_id")

merge.list <- lapply(X = merge.list, FUN = function(x) {
  x <- NormalizeData(x)
  x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})

Correct batch effect

corrected.results <- iSMNN(object.list = merge.list, batch.cluster.labels = batch.cluster.labels, matched.clusters = c("endothelial cells", "macrophage", "fibroblast"), strategy = "Short.run", iterations = 5, dims = 1:20, npcs = 30, k.filter = 30)

iSMNN function will return a Seurat object that contains the batch-corrected expression matrix for batches.

# Output after correction for batch 
corrected.results

Citation

Yang, Y., Li, G., Xie, Y., Wang, L, Yang, Y., Liu, J., Qian, L., Li., Y. (2020) iSMNN: Batch Effect Correction for Single-cell RNA-seq data via Iterative Supervised Mutual Nearest Neighbor Refinement. biorxiv, 2020.

Credits

Some functions are borrowed from or executed according to the Seurat v3 package (Sturat et al., 2019).




yycunc/iSMNN documentation built on June 11, 2022, 8:37 p.m.