Description Usage Arguments Details Value Author(s) References See Also Examples
Perform MNN correction based on cluster centroids, using the corrected centroid coordinates to correct the per-cell expression values with a variable bandwidth Gaussian kernel.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
... |
One or more log-expression matrices where genes correspond to rows and cells correspond to columns.
Alternatively, one or more SingleCellExperiment objects can be supplied containing a log-expression matrix in the If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch.
If a single object is supplied, it is assumed to contain cells from all batches, so Alternatively, one or more lists of matrices or SingleCellExperiments can be provided;
this is flattened as if the objects inside each list were passed directly to |
batch |
A factor specifying the batch of origin for all cells when only a single object is supplied in |
restrict |
A list of length equal to the number of objects in |
clusters |
A list of length equal to |
cluster.d |
Integer scalar indicating how many PCs should be used in clustering.
Only used if |
cos.norm |
A logical scalar indicating whether cosine normalization should be performed on the input data prior to PCA. |
merge.order |
An integer vector containing the linear merge order of batches in |
auto.merge |
Logical scalar indicating whether to automatically identify the “best” merge order. |
min.batch.skip |
Numeric scalar specifying the minimum relative magnitude of the batch effect, below which no correction will be performed at a given merge step. |
subset.row |
A vector specifying which features to use for correction. |
correct.all |
Logical scalar indicating whether corrected expression values should be computed for genes not in |
assay.type |
A string or integer scalar specifying the assay containing the log-expression values. Only used for SingleCellExperiment inputs. |
BSPARAM |
A BiocSingularParam object specifying the algorithm to use for PCA.
Only used if |
BNPARAM |
A BiocNeighborParam object specifying the nearest neighbor algorithm. |
BPPARAM |
A BiocParallelParam object specifying whether the PCA and nearest-neighbor searches should be parallelized. |
These functions are motivated by the scenario where each batch has been clustered separately
and each cluster has already been annotated with some meaningful biological state.
We want to identify which biological states match to each other across batches;
this is achieved by identifying mutual nearest neighbors based on the cluster centroids with reducedMNN
.
MNN pairs are identified with k=1
to ensure that each cluster has no more than one match in another batch.
This reduces the risk of inadvertently merging together different clusters from the same batch.
By comparison, higher values of k
may result in many-to-one mappings between batches
such that the correction will implicitly force different clusters together.
Using this guarantee of no-more-than-one mappings across batches, we can form meta-clusters by identifying all components of the resulting MNN graph. Each meta-cluster can be considered to represent some biological state (e.g., cell type), and all of its constituents are the matching clusters within each batch.
As an extra courtesy, clusterMNN
will also compute corrected values for each cell.
This is done by applying a Gaussian kernel to the correction vectors for the centroids,
where the bandwidth is proportional to the distance between that cell and the closest cluster centroid.
This yields a smooth correction function that avoids edge effects at cluster boundaries.
If clusters
is set to a BlusterParam object (see the bluster package),
a PCA is performed in each batch with the specified BSPARAM
.
The PCs are then used in clustering with clusterRows
to obtain a list of clusters.
This can be used to mimic per-cell batch correction in the absence of a priori clusters.
A SingleCellExperiment containing per-cell expression values where each row is a gene and each column is a cell.
This has the same format as the output of fastMNN
but with an additional cluster
field in the colData
containing the cluster identity of each cell.
The metadata
contains:
merge.info
, a DataFrame with the same format as the output of fastMNN
.
However, the pairs
and lost.var
refer to the cluster centroids, not the cells.
clusters
, a DataFrame with one row for each cluster across all batches in ...
.
This can be row-indexed by the values in pairs
to determine the identity of the clusters in each MNN pair.
An additional meta
column is provided that describes the meta-cluster to which each cluster belongs.
Aaron Lun
Lun ATL (2019). Cluster-based mutual nearest neighbors correction https://marionilab.github.io/FurtherMNN2018/theory/clusters.html
Lun ATL (2019). A discussion of the known failure points of the fastMNN algorithm. https://marionilab.github.io/FurtherMNN2018/theory/failure.html
reducedMNN
, which is used internally to perform the correction.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # Mocking up some data for multiple batches:
means <- matrix(rnorm(3000), ncol=3)
colnames(means) <- LETTERS[1:3]
B1 <- means[,sample(LETTERS[1:3], 500, replace=TRUE)]
B1 <- B1 + rnorm(length(B1))
B2 <- means[,sample(LETTERS[1:3], 500, replace=TRUE)]
B2 <- B2 + rnorm(length(B2)) + rnorm(nrow(B2)) # batch effect.
# Applying the correction with some made-up clusters:
cluster1 <- kmeans(t(B1), centers=10)$cluster
cluster2 <- kmeans(t(B2), centers=10)$cluster
out <- clusterMNN(B1, B2, clusters=list(cluster1, cluster2))
rd <- reducedDim(out, "corrected")
plot(rd[,1], rd[,2], col=out$batch)
# Obtaining the clusters internally.
out2 <- clusterMNN(B1, B2, clusters=bluster::NNGraphParam())
rd2 <- reducedDim(out2, "corrected")
plot(rd2[,1], rd2[,2], col=out$batch)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.