KSoftImpute: KNN-based imputation

Description Usage Arguments Value See Also Examples

View source: R/KSoftImpute.R

Description

KSoftImpute is an ultra-fast method for imputing missing gene expression values in single cell data. KSoftImpute uses k-nearest neighbors to impute the expression of each gene by the weighted average of itself and it's first-degree neighbors. Weights for imputation are determined by the number of detected genes. This method works for large data sets (>100,000 cells) in under a minute.

Usage

1
KSoftImpute(E, dM = NULL, genes.to.use = NULL, verbose = FALSE)

Arguments

E

A gene-by-sample count matrix (sparse matrix or matrix) with genes identified by their HUGO symbols.

dM

see ?CID.GetDistMat

genes.to.use

a character vector of genes to impute. Default is NULL.

verbose

If TRUE, code reports outputs. Default is FALSE.

Value

An expression matrix (sparse matrix) with imputed values.

See Also

Signac and SignacFast

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Not run: 
# download single cell data for classification
file.dir = "https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/"
file = "pbmc_1k_v3_filtered_feature_bc_matrix.h5"
download.file(paste0(file.dir, file), "Ex.h5")

# load data, process with Seurat
library(Seurat)
E = Read10X_h5(filename = "Ex.h5")
pbmc <- CreateSeuratObject(counts = E, project = "pbmc")

# run Seurat pipeline
pbmc <- SCTransform(pbmc, verbose = FALSE)
pbmc <- RunPCA(pbmc, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)

# get edges from default assay from Seurat object
default.assay <- Seurat::DefaultAssay(pbmc)
edges = pbmc@graphs[[which(grepl(paste0(default.assay, "_nn"), names(pbmc@graphs)))]]

# get distance matrix
dM = CID.GetDistMat(edges)

# run imputation
Z = KSoftImpute(E = E, dM = dM, verbose = TRUE)

## End(Not run)

SignacX documentation built on July 22, 2021, 9:10 a.m.