In single-cell sequencing analysis, sometimes we benefit from combining cells to form 'meta-cells' (by averaging their expression). For example, the number of meta-cells are much smaller than original single cells (same benefit as downsampling); also, meta-cells should have less drop-outs, thus giving a more stable estimation of single-cell features. There are reports (https://www.nature.com/articles/s41467-021-25960-2) saying that 'pseudo-bulk' type strategies outperforms 'single-cell' ones in differential gene detection. Thus, I think it will be very beneficial to have a tool for creating 'meta-cells'.
Actually, there are already such tools, such as MetaCell (https://doi.org/10.1186/s13059-019-1812-2) and VISION (https://doi.org/10.1038/s41467-019-12235-0, micro-clustering). Here, another tool is proposed, which is simpler and more convenient for users.
An optimal tool of this kind should be:
Here, the SRAVG is designed following the above ideas: - First, SRAVG split cells (to form meta-cell) based on their distances (close neighbors are grouped, so that heterogeneity is retained); - Second, each meta-cell is averaged from the same number of cells (defined by users; thanks to the balanced_clustering() function in the anticlust package (https://cran.r-project.org/web/packages/anticlust/index.html)) - Third, each meta-cell is averaged from the cells within a predefined group (defined by users) - Fourth, SRAVG can be seen as a Seurat wrapper, which takes a Seurat object as input and generate a new Seurat object as its output.
The averaging effect would be like (pbmc3k data):
From the 'shape' of clusters we can tell that the heterogeneity is retained to some extent.
It is also worth mentioning that the current version of SRAVG supports averaging a 'chromatin assay' (Signac) together with an 'RNA assay'. Creating multiomic meta-cells is not seen in current publications.
# install.packages("remotes")
#Turn off warning-error-conversion, because the tiniest warning stops installation
Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")
#install from github
remotes::install_github("https://github.com/qingnanl/SRAVG")
For more details please refer to https://github.com/qingnanl/SRAVG/tree/master/vignettes
A quick look:
library(Seurat)
library(SRAVG)
library(dplyr)
library(Matrix)
library(SeuratData)
data("pbmc3k")
# preprocess; for running sravg(), it requires one dimension reduction coords
# and one predefined group (cell-type/cluster/donor/sample) information
pbmc3k <- pbmc3k %>%
NormalizeData() %>%
FindVariableFeatures() %>%
ScaleData() %>%
RunPCA(verbose = FALSE) %>%
FindNeighbors(dims = 1:10) %>%
FindClusters(resolution = 0.5) %>%
RunUMAP(dims = 1:10, verbose = FALSE)
# run sravg() function
pbmc_avg <- sravg(object = pbmc3k, dr_key = "pca", dr_dims = 1:10,
group_size = 10, group_within = "seurat_clusters",
extra_meta = c("nCount_RNA", "nFeature_RNA"))
Averaging single-cell multiomics data is now supported. Users can define 'peak_assay' and 'peak_slot', so the same group of cells will also be merged (averaged) for the peak-by-cell matrix. For details, please refer to https://github.com/qingnanl/SRAVG/tree/master/vignettes.
data_avg <- sravg(object = data, dr_key = 'pca', dr_dims = 1:10, group_size = 10,
group_within = 'seurat_clusters',peak_assay = "peaks", peak_slot = "data",
extra_meta = c('nCount_RNA', 'nFeature_RNA', 'nCount_ATAC', 'nFeature_ATAC'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.