EM_impute: Monte Carlo EM algorithm for imputation and clustering
In xyz111131/SIMPLEs: SIMPLEs: single-cell RNA sequencing imputation and cell clustering methods by modeling gene module variation

Description Usage Arguments Details Value Author(s)

View source: R/EM_impute.R

Monte Carlo EM algorithm to sample the imputed values, cluster the cells and learn the correlation structure of genes in each cluster.

EM_impute(Y, Y0, pg, M0, K0, cutoff, iter, beta, sigma, lambda, pi, z,
  mu = NULL, celltype = NULL, penl = 1, est_z = 2,
  max_lambda = T, est_lam = 2, impt_it = 5, sigma0 = 100,
  pi_alpha = 1, verbose = F, num_mc = 3, lower = -Inf,
  upper = Inf)

`Y`	An initial imputed gene expression matrix.
`Y0`	Original scRNASeq data matrix.
`pg`	A matrix for dropout rate of each cell type. Each row is a gene, each column is the dropout rate of a cell type. The columns should be ordered as the cell type label in clus.
`M0`	Number of clusters.
`K0`	Number of latent gene modules.
`cutoff`	The value below cutoff is treated as no expression.
`iter`	Number of EM steps.
`beta`	A G by K0 matrix. Initial values for factor loadings (B). See details.
`sigma`	A G by M0 matrix. Initial values for the variance of idiosyncratic noises. Each column is for a cell cluster. See details.
`lambda`	A M0 by K0 matrix. Initial values for the variances of factors. Each column is for a cell cluster. See details.
`pi`	A vector for initial probabilites of cells belong to each cluster.
`z`	A n by M0 matrix for the probability of each cell belonging to each cluster. Can be initialized as the one-hot encoding of cluster membership of cells. If null, z will be updated in the first iteration.
`mu`	A G by M0 matrix. Initial values for the gene expression mean of each cluster. Each column is for a cell cluster. If NULL, it will take the sample mean of cells weighted by the probability in each cluster. See details.
`celltype`	A numeric vector for labels of cells in the scRNASeq. Each cell type has different dropout rate. If input bulk RNASeq data, each cell type has corresponding mean expression in the bulk RNASeq data. The labels must start from 1 to the number of types. If NULL, all cells are treated as a single cell type.
`penl`	L1 penalty for the factor loadings.
`est_z`	The iteration starts to update z.
`max_lambda`	Whether to maximize over lambda.
`est_lam`	The iteration starts to estimate lambda.
`impt_it`	The iteration starts to sample new imputed values.
`sigma0`	The variance of the prior distribution of μ.
`pi_alpha`	The hyperparameter of the prior distribution of π. See details.
`verbose`	Whether to show some intermediate results. Default = False.

Suppose there are G genes and n cells. For each cell cluster, the gene expression follows Y|Z=m~MVN(μ_m, BΛ_m B^T + Σ_m) where B is a G by K0 matrix, Σ_m is a G by G diagonal matrix whose diagonal entries are specified by sigma, and Λ_m is a K0 by K0 diagonal matrix whose diagonal entries are specified by lambda. P(Z_m) = π_m where π~Dir(α). We remove the overall mean of each gene before running the algorithm and all the parameters are estimated based on the normalized gene expression matrix. The overall mean is returned as geneM.

EM_impute returns a list of results in the following order.

loglikThe log-likelihood of the imputed gene expression at each iteration.
piProbabilites of cells belong to each cluster.
muMean expression for each cluster.
sigmaVariances of idiosyncratic noises for each cluster.
betaFactor loadings.
lambdaVariances of factors for each cluster.
zThe probability of each cell belonging to each cluster.
EfConditonal expection the factors for each cluster E(f_i|z_i = m). A list with length M0, each element in the list is a n by K0 matrix.
VarfConditonal covariance of factors for each cluster Var(f_i|z_i = m). A list with length M0, each element in the list is a K0 by K0 matrix.
YLast sample of imputed matrix.
geneMOverall mean of each gene expression. See details.
geneSdEqual to 1 for each gene.

Zhirui Hu, zhiruihu@g.harvard.edu

Songpeng Zu, songpengzu@g.harvard.edu

xyz111131/SIMPLEs documentation built on Jan. 8, 2020, 2:48 a.m.

xyz111131/SIMPLEs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

xyz111131/SIMPLEs
SIMPLEs: single-cell RNA sequencing imputation and cell clustering methods by modeling gene module variation

EM_impute: Monte Carlo EM algorithm for imputation and clustering
In xyz111131/SIMPLEs: SIMPLEs: single-cell RNA sequencing imputation and cell clustering methods by modeling gene module variation

Description

Usage

Arguments

Details

Value

Author(s)

Related to EM_impute in xyz111131/SIMPLEs...

R Package Documentation

Browse R Packages

We want your feedback!

xyz111131/SIMPLEs SIMPLEs: single-cell RNA sequencing imputation and cell clustering methods by modeling gene module variation

EM_impute: Monte Carlo EM algorithm for imputation and clustering In xyz111131/SIMPLEs: SIMPLEs: single-cell RNA sequencing imputation and cell clustering methods by modeling gene module variation

Description

Usage

Arguments

Details

Value

Author(s)

Related to EM_impute in xyz111131/SIMPLEs...

R Package Documentation

Browse R Packages

We want your feedback!

xyz111131/SIMPLEs
SIMPLEs: single-cell RNA sequencing imputation and cell clustering methods by modeling gene module variation

EM_impute: Monte Carlo EM algorithm for imputation and clustering
In xyz111131/SIMPLEs: SIMPLEs: single-cell RNA sequencing imputation and cell clustering methods by modeling gene module variation