Description Usage Arguments Value Author(s) References Examples
This function SMNNcorrect is designed to perform supervised batch effect correction for scRNA-seq data by first identifying nearest neighbors (NNs) within corresponding clusters (or cell types) and then leveraging information from these NNs. It takes as input raw expression matrices from two or more batches and a list of the unified cluster labels (output from unifiedClusterLabelling). It outputs batch-corrected expression matrix for each batch.
1 |
batches |
is a list of two or more expression matrices each corresponding to one batch, where each row corresponds to a gene, and each colname correspond to a cell. The number and order of rows should be identical across all maxtices (i.e., all batches should have the exact same gene set and in the same order). |
batch.cluster.labels |
is a list of vectors specifying the cluster labels of each cell from each batch. Cells not belonging to any clusters should be set to 0. SMNN performs batch effect correction without any prior knowledge on cell cluster if batch.cluster.labels = NULL. |
matched.clusters |
specifies the cell clusters matched between two or more batches. |
correct.others |
is a Boolean variable that defines whether to search nearest neighbors among the cells not belonging to any clusters. Default is FALSE, that is, cells not belonging to any clusters will not be considered as candidate nearest neighbors. |
k |
defines the maximum number of nearest neighbors to be identified. Default is 20. |
sigma |
defines the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell. Default is 1. |
cos.norm.in |
is a boolean variable that defines whether to do cosine normalization on input data before computing distances between cells. Default is "TRUE". |
cos.norm.out |
is a boolean variable that defines whether to do cosine normalization on output data before computing corrected expression results. Default is "TRUE". |
var.adj |
is a Boolean variable that indicates whether to do variance adjustment on the correction vectors. Default is "TRUE". |
subset.genes |
is a vector specifying the gene set that is used for computing correction vectors. Default is subset.genes = NULL, which means to use all the genes to compute correction vectors. |
order |
is an vector defining the reference batch and the order of the other batches to be corrected. |
n.jobs |
specifies the number of parallel jobs. It would be set to the number of cores when |
SMNNcorrect returns the following:
corrected expression matrix for each batch
information regarding NNs between the current batch under correction and the reference batch
Yuchen Yang <yyuchen@email.unc.edu>, Gang Li <franklee@live.unc.edu>, Huijun Qian <hjqian@live.unc.edu>, Yun Li <yunli@med.unc.edu>
Yuchen Yang, Gang Li, Huijun Qian, Yun Li. SMNNcorrect 2018
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Load the example data data_SMNN
data("data_SMNN")
# Provide the marker genes for cluster matching
markers <- c("Col1a1", "Pdgfra", "Ptprc", "Pecam1")
# Specify the cluster labels for each marker gene
cluster.info <- c(1, 1, 2, 3)
# Call function unifiedClusterLabelling to identify the corresponding clusters between two batches
matched_clusters <- unifiedClusterLabelling(data_SMNN$batch1.mat, data_SMNN$batch2.mat, features.use = markers, cluster.labels = cluster.info, min.perc = 0.3)
# Set python version used in SMNNcorrect
library(reticulate)
use_python("/nas/longleaf/apps/python/3.5.1/bin/python3")
# Perform batch effect correction using SMNNcorrect
corrected.results <- SMNNcorrect(batches = list(batches = list(data_SMNN$batch1.mat, data_SMNN$batch2.mat), batch.cluster.labels = matched_clusters, matched.clusters = c(1,2,3), k=20, sigma=1, cos.norm.in=TRUE, cos.norm.out=TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.