Description Usage Arguments Details Value Author(s)
Monte Carlo EM algorithm to sample the imputed values, cluster the cells and learn the correlation structure of genes in each cluster.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
Y |
An initial imputed gene expression matrix. |
Y0 |
Original scRNASeq data matrix. |
pg |
A matrix for dropout rate of each cell type. Each row is a gene, each column is the dropout rate of a cell type. The columns should be ordered as the cell type label in clus. |
M0 |
Number of clusters. |
K0 |
Number of latent gene modules. |
cutoff |
The value below cutoff is treated as no expression. |
iter |
Number of EM steps. |
beta |
A G by K0 matrix. Initial values for factor loadings (B). See details. |
sigma |
A G by M0 matrix. Initial values for the variance of idiosyncratic noises. Each column is for a cell cluster. See details. |
lambda |
A M0 by K0 matrix. Initial values for the variances of factors. Each column is for a cell cluster. See details. |
pi |
A vector for initial probabilites of cells belong to each cluster. |
z |
A n by M0 matrix for the probability of each cell belonging to each cluster. Can be initialized as the one-hot encoding of cluster membership of cells. If null, z will be updated in the first iteration. |
mu |
A G by M0 matrix. Initial values for the gene expression mean of each cluster. Each column is for a cell cluster. If NULL, it will take the sample mean of cells weighted by the probability in each cluster. See details. |
celltype |
A numeric vector for labels of cells in the scRNASeq. Each cell type has different dropout rate. If input bulk RNASeq data, each cell type has corresponding mean expression in the bulk RNASeq data. The labels must start from 1 to the number of types. If NULL, all cells are treated as a single cell type. |
penl |
L1 penalty for the factor loadings. |
est_z |
The iteration starts to update z. |
max_lambda |
Whether to maximize over lambda. |
est_lam |
The iteration starts to estimate lambda. |
impt_it |
The iteration starts to sample new imputed values. |
sigma0 |
The variance of the prior distribution of μ. |
pi_alpha |
The hyperparameter of the prior distribution of π. See details. |
verbose |
Whether to show some intermediate results. Default = False. |
Suppose there are G genes and n cells. For each cell cluster, the gene expression follows Y|Z=m~MVN(μ_m, BΛ_m B^T + Σ_m) where B is a G by K0 matrix, Σ_m is a G by G diagonal matrix whose diagonal entries are specified by sigma, and Λ_m is a K0 by K0 diagonal matrix whose diagonal entries are specified by lambda. P(Z_m) = π_m where π~Dir(α). We remove the overall mean of each gene before running the algorithm and all the parameters are estimated based on the normalized gene expression matrix. The overall mean is returned as geneM.
EM_impute
returns a list of results in the following order.
loglikThe log-likelihood of the imputed gene expression at each iteration.
piProbabilites of cells belong to each cluster.
muMean expression for each cluster.
sigmaVariances of idiosyncratic noises for each cluster.
betaFactor loadings.
lambdaVariances of factors for each cluster.
zThe probability of each cell belonging to each cluster.
EfConditonal expection the factors for each cluster E(f_i|z_i = m). A list with length M0, each element in the list is a n by K0 matrix.
VarfConditonal covariance of factors for each cluster Var(f_i|z_i = m). A list with length M0, each element in the list is a K0 by K0 matrix.
YLast sample of imputed matrix.
geneMOverall mean of each gene expression. See details.
geneSdEqual to 1 for each gene.
Zhirui Hu, zhiruihu@g.harvard.edu
Songpeng Zu, songpengzu@g.harvard.edu
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.