iSMNN: iSMNN

View source: R/iSMNN.R

iSMNNR Documentation

iSMNN

Description

This function iSMNN is designed to perform iterative supervised batch effect correction for scRNA-seq data by refining mutual nearest neighbors (MNNs) within corresponding clusters (or cell types) on the top of corrected data. It takes as input raw expression matrices from two or more batches and a list of the unified cluster labels (output from unifiedClusterLabelling of SMNN package). It outputs a Seurat object that contains the the batch-corrected expression matrix for batches

Usage

iSMNN(object.list = merge.list, batch.cluster.labels = batch.cluster.labels, matched.clusters = c("Endothelial cells", "Macrophage", "Fibroblast"), strategy = "Short.run", iterations = 5, dims = 1:20, npcs = 30)

Arguments

object.list

A list of Seurat objects between which to find anchors for downstream integration.

assay

A vector of assay names specifying which assay to use when constructing anchors. If NULL, the current default assay for each object is used.

batch.cluster.labels

is a list of vectors specifying the cluster labels of each cell from each batch. Cells not belonging to any clusters should be set to 0.

matched.clusters

specifies the cell clusters matched between two or more batches.

strategy

specifies the iteration option chosen for batch effect correction that in the first option "Short.run", iSMNN runs for a fixed number of iterations (default = 5) and takes the output with the lowest F statistic as the optimal correction results; in the second option "Long.run", after the first local minimum is observed, an additional number of iterations (default = 3) is run to allow leveraging possible further decrease of F statistic after the first local minimal value.

iterations

defines the number of iterations to execute.

reference

A vector specifying the object/s to be used as a reference during integration. If NULL (default), all pairwise anchors are found (no reference/s). If not NULL, the corresponding objects in object.list will be used as references. When using a set of specified references, anchors are first found between each query and each reference The references are then integrated through pairwise integration. Each query is then mapped to the integrated reference

anchor.features

Can be either:

  • A numeric value. This will call SelectIntegrationFeatures to select the provided number of features to be used in anchor finding

  • A vector of features to be used as input to the anchor finding process

scale

Whether or not to scale the features provided. Only set to FALSE if you have previously scaled the features you want to use for each object in the object.list

reduction

Dimensional reduction to perform when finding anchors. Can be one of:

  • cca: Canonical correlation analysis

  • rpca: Reciprocal PCA

l2.norm

Perform L2 normalization on the CCA cell embeddings after dimensional reduction

dims

Which dimensions to use from the CCA to specify the neighbor search space

k.anchor

How many neighbors (k) to use when picking anchors

k.filter

How many neighbors (k) to use when filtering anchors

k.score

How many neighbors (k) to use when scoring anchors

max.features

The maximum number of features to use when specifying the neighborhood search space in the anchor filtering

nn.method

Method for nearest neighbor finding. Options include: rann, annoy

eps

Error bound on the neighbor finding algorithm (from RANN)

k.weight

Number of neighbors to consider when weighting. Default is k.weight = 100

verbose

Print progress bars and output

sd.weigth

defines the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell. Default is sd.weigth = 1

Value

iSMNN returns a Seurat object that contains the the batch-corrected expression matrix for batches

Author(s)

Yuchen Yang <yyuchen@email.unc.edu>, Gang Li <franklee@live.unc.edu>, Li Qian <li_qian@med.unc.edu>, Yun Li <yunli@med.unc.edu>

References

Yuchen Yang, Gang Li, Li Qian, Yun Li. iSMNN 2020

Examples

# Load the example data data_SMNN
data("data_iSMNN")

# Provide the marker genes for cluster matching
markers <- c("Col1a1", "Pdgfra", "Ptprc", "Pecam1")

# Specify the cluster labels for each marker gene
cluster.info <- c("fibroblast", "fibroblast", "macrophage", "endothelial cells")

# Harmonize cluster labels across batches
library(SMNN)
batch.cluster.labels <- unifiedClusterLabelling(batches = list(data_SMNN$batch1.mat, data_iSMNN$batch2.mat), features.use = markers,
                                                cluster.labels = cluster.info, min.perc = 0.3)
names(batch.cluster.labels[[1]]) <- colnames(data_iSMNN$batch1.mat)
names(batch.cluster.labels[[2]]) <- colnames(data_iSMNN$batch2.mat)

# Construct the input object for batches using Seurat
library(Seurat)
merge <- CreateSeuratObject(counts = cbind(data_iSMNN$batch1.mat, data_iSMNN$batch2.mat), min.cells = 0, min.features = 0)
batch_id <- c(rep("batch1", ncol(data_iSMNN$batch1.mat)), rep("batch2", ncol(data_iSMNN$batch2.mat)))
names(batch_id) <- colnames(merge)
merge <- AddMetaData(object = merge, metadata = batch_id, col.name = "batch_id")
merge.list <- SplitObject(merge, split.by = "batch_id")

merge.list <- lapply(X = merge.list, FUN = function(x) {
  x <- NormalizeData(x)
  x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})

# Correct batch effect
corrected.results <- iSMNN(object.list = merge.list, batch.cluster.labels = batch.cluster.labels,
                           matched.clusters = c("endothelial cells", "macrophage", "fibroblast"),
                           strategy = "Short.run", iterations = 5, dims = 1:20, npcs = 30, k.filter = 30)


yycunc/iSMNN documentation built on June 11, 2022, 8:37 p.m.