ccImpute: Performs imputation of dropout values in scRNA-seq data using...

View source: R/Methods.R

ccImputeR Documentation

Performs imputation of dropout values in scRNA-seq data using ccImpute algorithm as described in the ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data DOI: https://doi.org/10.1186/s12859-022-04814-8

Description

Performs imputation of dropout values in scRNA-seq data using ccImpute algorithm as described in the ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data DOI: https://doi.org/10.1186/s12859-022-04814-8

Usage

ccImpute(
    logX,
    useRanks = TRUE,
    pcaMin,
    pcaMax,
    k,
    consMin = 0.65,
    kmNStart,
    kmMax = 1000,
    BPPARAM = bpparam()
)

Arguments

logX

A normalized and log transformed scRNA-seq expression matrix.

useRanks

A Boolean specifying if non-parametric version of weighted Pearson correlation should be used. It's recommended to keep this as TRUE since this performs better as determined experimentally. However, FALSE will also provide decent results with the benefit or faster runtime.

pcaMin

This is used to establish the number of minimum PCA features used for generating subsets. For small datasets up to 500 cells this equals pcaMin*n minimum features, where n is number of cells. For large datasets, this corresponds to the feature count that has proportion of variance less than pcaMin. Both pcaMin and pcaMax must be specified to be considered. It's best to keep this value as default unless a better value was obtained experimentally.

pcaMax

This is used to establish the number of maximum PCA features used for generating subsets. For small datasets up to 500 cells this equals pcaMax*n maximum features, where n is number of cells. For large datasets, this corresponds to the feature count that has proportion of variance less than pcaMax. Both pcaMin and pcaMax must be specified to be considered. It's best to keep this value as default unless a better value was obtained experimentally.

k

centers parameter passed to kmeans function. This corresponds to a number of different cell groups in data. This can be estimated in a number of methods. If not provided we take the approach provided in the SIMLR package. (https://www.bioconductor.org/packages/release/bioc/html/SIMLR.html)

consMin

the low-pass filter threshold for processing consensus matrix. This is to eliminate noise from unlikely clustering assignmnets. It is recommended to keep this value >-.5.

kmNStart

nstart parameter passed to kmeans. function. Can be set manually. By default it is 1000 for up to 2000 cells and 50 for more than 2000 cells.

kmMax

iter.max parameter passed to kmeans. ccImpute is a stochastic method, and setting the rand_seed allows reproducibility.

BPPARAM

- BiocParallel parameters for parallelization

Value

A normalized and log transformed scRNA-seq expression matrix with imputed missing values.

Examples

exp_matrix <- log(abs(matrix(rnorm(1000000),nrow=10000))+1)
ccImpute(exp_matrix, k = 2)

khazum/ccImpute documentation built on Nov. 28, 2022, 7:27 a.m.