Description Usage Arguments Value Author(s) Examples
This function simultaneously learns a mixture of causal networks and clusters of a cell population from single cell perturbation data (e.g. log odds of fold change) with a multi-trait readout. E.g. Pooled CRISPR scRNA-Seq data (Perturb-Seq. Dixit et al., 2016, Crop-Seq. Datlinger et al., 2017).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | mnem(
D,
inference = "em",
search = "greedy",
phi = NULL,
theta = NULL,
mw = NULL,
method = "llr",
parallel = NULL,
reduce = FALSE,
runs = 1,
starts = 3,
type = "networks",
complete = FALSE,
p = NULL,
k = NULL,
kmax = 10,
verbose = FALSE,
max_iter = 100,
parallel2 = NULL,
converged = -Inf,
redSpace = NULL,
affinity = 0,
evolution = FALSE,
lambda = 1,
subtopoX = NULL,
ratio = TRUE,
logtype = 2,
domean = TRUE,
modulesize = 5,
compress = FALSE,
increase = TRUE,
fpfn = c(0.1, 0.1),
Rho = NULL,
ksel = c("kmeans", "silhouette", "cor")
)
|
D |
data with cells indexing the columns and features (E-genes) indexing the rows |
inference |
inference method "em" for expectation maximization |
search |
search method for single network inference "greedy", "exhaustive" or "modules" (also possible: "small", which is greedy with only one edge change per M-step to make for a smooth convergence) |
phi |
a list of n lists of k networks for n starts of the EM and k components |
theta |
a list of n lists of k attachment vector for the E-genes for n starts of the EM and k components |
mw |
mixture weights; if NULL estimated or uniform |
method |
"llr" for log ratios or foldchanges as input (see ratio) |
parallel |
number of threads for parallelization of the number of em runs |
reduce |
logical - reduce search space for exhaustive search to unique networks |
runs |
number of runs for greedy search |
starts |
number of starts for the em |
type |
initialize with responsibilities either by "random", "cluster" (each S-gene is clustered and the different S-gene clustered differently combined for several starts), "cluster2" (clustNEM is used to infer reasonable phis, which are then used as a start for one EM run), "cluster3" (global clustering as a start), or "networks" (initialize with random phis) |
complete |
if TRUE, optimizes the expected complete log likelihood of the model, otherwise the log likelihood of the observed data |
p |
initial probabilities as a k (components) times l (cells) matrix |
k |
number of components |
kmax |
maximum number of components when k=NULL is inferred |
verbose |
verbose output |
max_iter |
maximum iteration, if likelihood does not converge |
parallel2 |
if parallel=NULL, number of threads for single component optimization |
converged |
absolute distance for convergence between new and old log likelihood; if set to -Inf, the EM stops if neither the phis nor thetas were changed in the most recent iteration |
redSpace |
space for "exhaustive" search |
affinity |
0 is default for soft clustering, 1 is for hard clustering |
evolution |
logical. If TRUE components are penelized for being different from each other. |
lambda |
smoothness value for the prior put on the components, if evolution set to TRUE |
subtopoX |
hard prior on theta as a vector with entry i equal to j, if E-gene i is attached to S-gene j |
ratio |
logical, if true data is log ratios, if false foldchanges |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
domean |
average the data, when calculating a single NEM (speed improvment) |
modulesize |
max number of S-genes per module in module search |
compress |
compress networks after search (warning: penelized likelihood not interpretable) |
increase |
if set to FALSE, the algorithm will not stop if the likelihood decreases |
fpfn |
numeric vector of length two with false positive and false negative rates for discrete data |
Rho |
perturbation matrix with dimensions nxl with n S-genes and l samples; either as probabilities with the sum of probabilities for a sample less or equal to 1 or discrete with 1s and 0s |
ksel |
character vector of methods for the inference of k; can combine "hc" (hierarchical clustering) or "kmeans" with "silhouette", "BIC" or "AIC"; can also include "cor" for correlation distance (preferred) instead of euclidean |
object of class mnem
comp |
list of the component with each component being a list of the causal network phi and the E-gene attachment theta |
data |
input data matrix |
limits |
list of results for all indpendent searches |
ll |
log likelihood of the best model |
lls |
log likelihood ascent of the best model search |
mw |
vector with mixture weights |
probs |
kxl matrix containing the cell log likelihoods of the model |
Martin Pirkl
1 2 3 4 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.