mnem | R Documentation |
This function simultaneously learns a mixture of causal networks and clusters of a cell population from single cell perturbation data (e.g. log odds of fold change) with a multi-trait readout. E.g. Pooled CRISPR scRNA-Seq data (Perturb-Seq. Dixit et al., 2016, Crop-Seq. Datlinger et al., 2017).
mnem(
D,
inference = "em",
search = "greedy",
phi = NULL,
theta = NULL,
mw = NULL,
method = "llr",
marginal = FALSE,
parallel = NULL,
reduce = FALSE,
runs = 1,
starts = 3,
type = "networks",
complete = FALSE,
p = NULL,
k = NULL,
kmax = 10,
verbose = FALSE,
max_iter = 100,
parallel2 = NULL,
converged = -Inf,
redSpace = NULL,
affinity = 0,
evolution = FALSE,
lambda = 1,
subtopoX = NULL,
ratio = TRUE,
logtype = 2,
domean = TRUE,
modulesize = 5,
compress = FALSE,
increase = TRUE,
fpfn = c(0.1, 0.1),
Rho = NULL,
ksel = c("kmeans", "silhouette", "cor"),
nullcomp = FALSE,
tree = FALSE,
burnin = 10,
hastings = TRUE,
nodeswitch = TRUE,
postgaps = 10,
penalized = FALSE,
accept_range = 1,
...
)
D |
data with cells indexing the columns and features (E-genes) indexing the rows |
inference |
inference method "em" for expectation maximization or "mcmc" for markov chain monte carlo sampling |
search |
search method for single network inference "greedy", "exhaustive" or "modules" (also possible: "small", which is greedy with only one edge change per M-step to make for a smooth convergence) |
phi |
a list of n lists of k networks for n starts of the EM and k components |
theta |
a list of n lists of k attachment vector for the E-genes for n starts of the EM and k components |
mw |
mixture weights; if NULL estimated or uniform |
method |
"llr" for log ratios or foldchanges as input (see ratio) |
marginal |
logical to compute the marginal likelihood (TRUE) |
parallel |
number of threads for parallelization of the number of em runs |
reduce |
logical - reduce search space for exhaustive search to unique networks |
runs |
number of runs for greedy search |
starts |
number of starts for the em or mcmc |
type |
initialize with responsibilities either by "random", "cluster" (each S-gene is clustered and the different S-gene clustered differently combined for several starts), "cluster2" (clustNEM is used to infer reasonable phis, which are then used as a start for one EM run), "cluster3" (global clustering as a start), or "networks" (initialize with random phis), inference='mcmc' only supports 'networks' and 'empty' for unconncected networks phi |
complete |
if TRUE, optimizes the expected complete log likelihood of the model, otherwise the log likelihood of the observed data |
p |
initial probabilities as a k (components) times l (cells) matrix |
k |
number of components |
kmax |
maximum number of components when k=NULL is inferred |
verbose |
verbose output |
max_iter |
maximum iterations (moves for inference='mcmc'. adjust parameter burnin) |
parallel2 |
if parallel=NULL, number of threads for single component optimization |
converged |
absolute distance for convergence between new and old log likelihood; if set to -Inf, the EM stops if neither the phis nor thetas were changed in the most recent iteration |
redSpace |
space for "exhaustive" search |
affinity |
0 is default for soft clustering, 1 is for hard clustering |
evolution |
logical. If TRUE components are penelized for being different from each other. |
lambda |
smoothness value for the prior put on the components, if evolution set to TRUE |
subtopoX |
hard prior on theta as a vector with entry i equal to j, if E-gene i is attached to S-gene j |
ratio |
logical, if true data is log ratios, if false foldchanges |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
domean |
average the data, when calculating a single NEM (speed improvment) |
modulesize |
max number of S-genes per module in module search |
compress |
compress networks after search (warning: penelized likelihood not interpretable) |
increase |
if set to FALSE, the algorithm will not stop if the likelihood decreases |
fpfn |
numeric vector of length two with false positive and false negative rates for discrete data |
Rho |
perturbation matrix with dimensions nxl with n S-genes and l samples; either as probabilities with the sum of probabilities for a sample less or equal to 1 or discrete with 1s and 0s |
ksel |
character vector of methods for the inference of k; can combine as the first two vlues "hc" (hierarchical clustering) or "kmeans" with "silhouette", "BIC" or "AIC"; the third value is either "cor" for correlation distance or any method accepted by the function 'dist' |
nullcomp |
if TRUE, adds a null component (k+1) |
tree |
if TRUE, restrict inference on trees (MCMC not included) |
burnin |
number of iterations to be discarded prior to analyzing the posterior distribution of the mcmc |
hastings |
if set to TRUE, the Hastings ratio is calculated |
nodeswitch |
if set to TRUE, node switching is allowed as a move, additional to the edge moves |
postgaps |
can be set to numeric. Determines after how many iterations the next Phi mixture is added to the Phi edge Frequency tracker in the mcmc |
penalized |
if set to TRUE, the penalized likelihood will be used for the mcmc. Per default this is FALSE, since no component learning is involved and sparcity is hence not enforced |
accept_range |
the random probability the acceptance probability is compared to (default: 1) |
... |
arguments to function nem |
object of class mnem
comp |
list of the component with each component being a list of the causal network phi and the E-gene attachment theta |
data |
input data matrix |
limits |
list of results for all indpendent searches |
ll |
log likelihood of the best model |
lls |
log likelihood ascent of the best model search |
mw |
vector with mixture weights |
probs |
kxl matrix containing the cell log likelihoods of the model |
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6))
data <- (sim$data - 0.5)/0.5
data <- data + rnorm(length(data), 0, 1)
result <- mnem(data, k = 2, starts = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.