knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Introduction

Dropout events make the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult. ccImpute is an imputation tool that uses cell similarity established by consensus clustering to impute the most probable dropout events in the scRNA-seq datasets. ccImpute demonstrates performance which exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.

Data Pre-processing

ccImpute is an imputation tool and it does not provide functions for the pre-processing the data. This tool expects the user to preprocess the data prior to using it. The input data is expected to be in log-normalized format. This manual includes sample minimal pre-processing of dataset from scRNAseq database using the scater tool.

Sample Usage

Required libraries

library(scRNAseq)
library(scater)
library(ccImpute)
library(SingleCellExperiment)
library(stats)
library(mclust)

Input Data.

The following code loads Darmanis dataset(Darmanis et al. "A survey of human brain transcriptome diversity at the single cell level."(2015)) and computes log-transformed normalized counts:

data <- DarmanisBrainData()
data <- logNormCounts(data)

Compute Adjusted Rand Index (ARI) without imputation.

# Compute PCA reduction of the dataset
reducedDims(data) <- list(PCA=prcomp(t(logcounts(data)))$x)

# Get an actual number of cell types
k <- length(unique(colData(data)$cell.type))

# Cluster the PCA reduced dataset and store the assignments
assgmts <- kmeans(reducedDim(data, "PCA"), centers = k, iter.max = 1e+09,
    nstart = 1000)$cluster

# Use ARI to compare the k-means assignments to label assignments
adjustedRandIndex(assgmts, colData(data)$cell.type)

Perform the imputation and update the logcounts assay.

logcounts(data) <- impute(assays(data)$logcounts, k = k, nCores = 2)

Recompute Adjusted Rand Index (ARI) with imputation.

# Recompute PCA reduction of the dataset
reducedDims(data) <- list(PCA=prcomp(t(logcounts(data)))$x)

# Cluster the PCA reduced dataset and store the assignments
assgmts <- kmeans(reducedDim(data, "PCA"), centers = k, iter.max = 1e+09,
    nstart = 1000)$cluster

# Use ARI to compare the k-means assignments to label assignments
adjustedRandIndex(assgmts, colData(data)$cell.type)

R session information.

## Session info
library("sessioninfo")
options(width = 120)
session_info()


khazum/ccImpute_exp documentation built on May 25, 2022, 6:15 a.m.