knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Dropout events make the lowly expressed genes indistinguishable from true zero expression and different from the low expression in cells of the same type. This issue makes any subsequent downstream analysis difficult. ccImpute[@malec2022ccimpute] is an imputation tool that uses cell similarity established by consensus clustering to impute the most probable dropout events in the scRNA-seq datasets. ccImpute demonstrates performance which exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.
To install this package, start R (version "4.2") and enter: if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("ccImpute")
ccImpute
is an imputation tool that does not provide functions for
pre-processing the data. This tool expects the user to pre-process the data
before using it. The input data is expected to be in a log-normalized format.
This manual includes sample minimal pre-processing of a dataset from
scRNAseq database using the
scater tool.
library(scRNAseq) library(scater) library(ccImpute) library(SingleCellExperiment) library(stats) library(mclust)
The following code loads Darmanis dataset[@darmanis2015survey] and computes log-transformed normalized counts:
sce <- DarmanisBrainData() sce <- logNormCounts(sce)
A user may consider performing feature selection prior to running the imputation. ccImpute only imputes the most probable dropout events and is unlikely to benefit from the presence of scarcely expressed genes nor make any correctio ns to their expression.
Adjusted Rand Index is a measure of the similarity between two data clusterings adjusted for the chance grouping of elements. This measure allows us to evaluate the performance of the clustering algorithm as a similarity to the optimal clustering assignments derived from cell labels.
# Set seed for reproducibility purposes. set.seed(0) # Compute PCA reduction of the dataset reducedDims(sce) <- list(PCA=prcomp(t(logcounts(sce)))$x) # Get an actual number of cell types k <- length(unique(colData(sce)$cell.type)) # Cluster the PCA reduced dataset and store the assignments set.seed(0) assgmts <- kmeans(reducedDim(sce, "PCA"), centers = k, iter.max = 1e+09, nstart = 1000)$cluster # Use ARI to compare the k-means assignments to label assignments adjustedRandIndex(assgmts, colData(sce)$cell.type)
assay(sce, "imputed") <- ccImpute(logcounts(sce), k = k)
# Recompute PCA reduction of the dataset reducedDim(sce, "PCA_imputed") <- prcomp(t(assay(sce, "imputed")))$x # Cluster the PCA reduced dataset and store the assignments assgmts <- kmeans(reducedDim(sce, "PCA_imputed"), centers = k, iter.max = 1e+09, nstart = 1000)$cluster # Use ARI to compare the k-means assignments to label assignments adjustedRandIndex(assgmts, colData(sce)$cell.type)
## Session info library("sessioninfo") options(width = 120) session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.