mnem: Mixture NEMs - main function.
In mnem: Mixture Nested Effects Models

Description Usage Arguments Value Author(s) Examples

This function simultaneously learns a mixture of causal networks and clusters of a cell population from single cell perturbation data (e.g. log odds of fold change) with a multi-trait readout. E.g. Pooled CRISPR scRNA-Seq data (Perturb-Seq. Dixit et al., 2016, Crop-Seq. Datlinger et al., 2017).

mnem(
  D,
  inference = "em",
  search = "greedy",
  phi = NULL,
  theta = NULL,
  mw = NULL,
  method = "llr",
  parallel = NULL,
  reduce = FALSE,
  runs = 1,
  starts = 3,
  type = "networks",
  complete = FALSE,
  p = NULL,
  k = NULL,
  kmax = 10,
  verbose = FALSE,
  max_iter = 100,
  parallel2 = NULL,
  converged = -Inf,
  redSpace = NULL,
  affinity = 0,
  evolution = FALSE,
  lambda = 1,
  subtopoX = NULL,
  ratio = TRUE,
  logtype = 2,
  domean = TRUE,
  modulesize = 5,
  compress = FALSE,
  increase = TRUE,
  fpfn = c(0.1, 0.1),
  Rho = NULL,
  ksel = c("kmeans", "silhouette", "cor")
)

`D`	data with cells indexing the columns and features (E-genes) indexing the rows
`inference`	inference method "em" for expectation maximization
`search`	search method for single network inference "greedy", "exhaustive" or "modules" (also possible: "small", which is greedy with only one edge change per M-step to make for a smooth convergence)
`phi`	a list of n lists of k networks for n starts of the EM and k components
`theta`	a list of n lists of k attachment vector for the E-genes for n starts of the EM and k components
`mw`	mixture weights; if NULL estimated or uniform
`method`	"llr" for log ratios or foldchanges as input (see ratio)
`parallel`	number of threads for parallelization of the number of em runs
`reduce`	logical - reduce search space for exhaustive search to unique networks
`runs`	number of runs for greedy search
`starts`	number of starts for the em
`type`	initialize with responsibilities either by "random", "cluster" (each S-gene is clustered and the different S-gene clustered differently combined for several starts), "cluster2" (clustNEM is used to infer reasonable phis, which are then used as a start for one EM run), "cluster3" (global clustering as a start), or "networks" (initialize with random phis)
`complete`	if TRUE, optimizes the expected complete log likelihood of the model, otherwise the log likelihood of the observed data
`p`	initial probabilities as a k (components) times l (cells) matrix
`k`	number of components
`kmax`	maximum number of components when k=NULL is inferred
`verbose`	verbose output
`max_iter`	maximum iteration, if likelihood does not converge
`parallel2`	if parallel=NULL, number of threads for single component optimization
`converged`	absolute distance for convergence between new and old log likelihood; if set to -Inf, the EM stops if neither the phis nor thetas were changed in the most recent iteration
`redSpace`	space for "exhaustive" search
`affinity`	0 is default for soft clustering, 1 is for hard clustering
`evolution`	logical. If TRUE components are penelized for being different from each other.
`lambda`	smoothness value for the prior put on the components, if evolution set to TRUE
`subtopoX`	hard prior on theta as a vector with entry i equal to j, if E-gene i is attached to S-gene j
`ratio`	logical, if true data is log ratios, if false foldchanges
`logtype`	logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural)
`domean`	average the data, when calculating a single NEM (speed improvment)
`modulesize`	max number of S-genes per module in module search
`compress`	compress networks after search (warning: penelized likelihood not interpretable)
`increase`	if set to FALSE, the algorithm will not stop if the likelihood decreases
`fpfn`	numeric vector of length two with false positive and false negative rates for discrete data
`Rho`	perturbation matrix with dimensions nxl with n S-genes and l samples; either as probabilities with the sum of probabilities for a sample less or equal to 1 or discrete with 1s and 0s
`ksel`	character vector of methods for the inference of k; can combine "hc" (hierarchical clustering) or "kmeans" with "silhouette", "BIC" or "AIC"; can also include "cor" for correlation distance (preferred) instead of euclidean

object of class mnem

`comp`	list of the component with each component being a list of the causal network phi and the E-gene attachment theta
`data`	input data matrix
`limits`	list of results for all indpendent searches
`ll`	log likelihood of the best model
`lls`	log likelihood ascent of the best model search
`mw`	vector with mixture weights
`probs`	kxl matrix containing the cell log likelihoods of the model

Martin Pirkl

sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6))
data <- (sim$data - 0.5)/0.5
data <- data + rnorm(length(data), 0, 1)
result <- mnem(data, k = 2, starts = 1)