SMNNcorrect: SMNN

Description Usage Arguments Value Author(s) References Examples

View source: R/SMNNcorrect.R

Description

This function SMNNcorrect is designed to perform supervised batch effect correction for scRNA-seq data by first identifying nearest neighbors (NNs) within corresponding clusters (or cell types) and then leveraging information from these NNs. It takes as input raw expression matrices from two or more batches and a list of the unified cluster labels (output from unifiedClusterLabelling). It outputs batch-corrected expression matrix for each batch.

Usage

1
SMNNcorrect(batches, batch.cluster.labels, matched.labels=c(1,2,3), correct.others=FALSE, k=20, sigma=1, cos.norm.in=TRUE, cos.norm.out=TRUE, var.adj=TRUE, subset.row=NULL, order=NULL, n.jobs=NULL)

Arguments

batches

is a list of two or more expression matrices each corresponding to one batch, where each row corresponds to a gene, and each colname correspond to a cell. The number and order of rows should be identical across all maxtices (i.e., all batches should have the exact same gene set and in the same order).

batch.cluster.labels

is a list of vectors specifying the cluster labels of each cell from each batch. Cells not belonging to any clusters should be set to 0. SMNN performs batch effect correction without any prior knowledge on cell cluster if batch.cluster.labels = NULL.

matched.clusters

specifies the cell clusters matched between two or more batches.

correct.others

is a Boolean variable that defines whether to search nearest neighbors among the cells not belonging to any clusters. Default is FALSE, that is, cells not belonging to any clusters will not be considered as candidate nearest neighbors.

k

defines the maximum number of nearest neighbors to be identified. Default is 20.

sigma

defines the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell. Default is 1.

cos.norm.in

is a boolean variable that defines whether to do cosine normalization on input data before computing distances between cells. Default is "TRUE".

cos.norm.out

is a boolean variable that defines whether to do cosine normalization on output data before computing corrected expression results. Default is "TRUE".

var.adj

is a Boolean variable that indicates whether to do variance adjustment on the correction vectors. Default is "TRUE".

subset.genes

is a vector specifying the gene set that is used for computing correction vectors. Default is subset.genes = NULL, which means to use all the genes to compute correction vectors.

order

is an vector defining the reference batch and the order of the other batches to be corrected.

n.jobs

specifies the number of parallel jobs. It would be set to the number of cores when n.jobs = NULL.

Value

SMNNcorrect returns the following:

Author(s)

Yuchen Yang <yyuchen@email.unc.edu>, Gang Li <franklee@live.unc.edu>, Huijun Qian <hjqian@live.unc.edu>, Yun Li <yunli@med.unc.edu>

References

Yuchen Yang, Gang Li, Huijun Qian, Yun Li. SMNNcorrect 2018

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Load the example data data_SMNN
data("data_SMNN")

# Provide the marker genes for cluster matching
markers <- c("Col1a1", "Pdgfra", "Ptprc", "Pecam1")

# Specify the cluster labels for each marker gene
cluster.info <- c(1, 1, 2, 3)

# Call function unifiedClusterLabelling to identify the corresponding clusters between two batches
matched_clusters <- unifiedClusterLabelling(data_SMNN$batch1.mat, data_SMNN$batch2.mat, features.use = markers, cluster.labels = cluster.info, min.perc = 0.3)

# Set python version used in SMNNcorrect
library(reticulate)
use_python("/nas/longleaf/apps/python/3.5.1/bin/python3")

# Perform batch effect correction using SMNNcorrect
corrected.results <- SMNNcorrect(batches = list(batches = list(data_SMNN$batch1.mat, data_SMNN$batch2.mat), batch.cluster.labels = matched_clusters, matched.clusters = c(1,2,3), k=20, sigma=1, cos.norm.in=TRUE, cos.norm.out=TRUE)

yycunc/SMNN documentation built on Dec. 29, 2021, 12:17 p.m.