mnem: Mixture NEMs - main function.

Description Usage Arguments Value Author(s) Examples

View source: R/mnems.r

Description

This function simultaneously learns a mixture of causal networks and clusters of a cell population from single cell perturbation data (e.g. log odds of fold change) with a multi-trait readout. E.g. Pooled CRISPR scRNA-Seq data (Perturb-Seq. Dixit et al., 2016, Crop-Seq. Datlinger et al., 2017).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
mnem(
  D,
  inference = "em",
  search = "greedy",
  phi = NULL,
  theta = NULL,
  mw = NULL,
  method = "llr",
  parallel = NULL,
  reduce = FALSE,
  runs = 1,
  starts = 3,
  type = "networks",
  complete = FALSE,
  p = NULL,
  k = NULL,
  kmax = 10,
  verbose = FALSE,
  max_iter = 100,
  parallel2 = NULL,
  converged = -Inf,
  redSpace = NULL,
  affinity = 0,
  evolution = FALSE,
  lambda = 1,
  subtopoX = NULL,
  ratio = TRUE,
  logtype = 2,
  domean = TRUE,
  modulesize = 5,
  compress = FALSE,
  increase = TRUE,
  fpfn = c(0.1, 0.1),
  Rho = NULL,
  ksel = c("kmeans", "silhouette", "cor")
)

Arguments

D

data with cells indexing the columns and features (E-genes) indexing the rows

inference

inference method "em" for expectation maximization

search

search method for single network inference "greedy", "exhaustive" or "modules" (also possible: "small", which is greedy with only one edge change per M-step to make for a smooth convergence)

phi

a list of n lists of k networks for n starts of the EM and k components

theta

a list of n lists of k attachment vector for the E-genes for n starts of the EM and k components

mw

mixture weights; if NULL estimated or uniform

method

"llr" for log ratios or foldchanges as input (see ratio)

parallel

number of threads for parallelization of the number of em runs

reduce

logical - reduce search space for exhaustive search to unique networks

runs

number of runs for greedy search

starts

number of starts for the em

type

initialize with responsibilities either by "random", "cluster" (each S-gene is clustered and the different S-gene clustered differently combined for several starts), "cluster2" (clustNEM is used to infer reasonable phis, which are then used as a start for one EM run), "cluster3" (global clustering as a start), or "networks" (initialize with random phis)

complete

if TRUE, optimizes the expected complete log likelihood of the model, otherwise the log likelihood of the observed data

p

initial probabilities as a k (components) times l (cells) matrix

k

number of components

kmax

maximum number of components when k=NULL is inferred

verbose

verbose output

max_iter

maximum iteration, if likelihood does not converge

parallel2

if parallel=NULL, number of threads for single component optimization

converged

absolute distance for convergence between new and old log likelihood; if set to -Inf, the EM stops if neither the phis nor thetas were changed in the most recent iteration

redSpace

space for "exhaustive" search

affinity

0 is default for soft clustering, 1 is for hard clustering

evolution

logical. If TRUE components are penelized for being different from each other.

lambda

smoothness value for the prior put on the components, if evolution set to TRUE

subtopoX

hard prior on theta as a vector with entry i equal to j, if E-gene i is attached to S-gene j

ratio

logical, if true data is log ratios, if false foldchanges

logtype

logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural)

domean

average the data, when calculating a single NEM (speed improvment)

modulesize

max number of S-genes per module in module search

compress

compress networks after search (warning: penelized likelihood not interpretable)

increase

if set to FALSE, the algorithm will not stop if the likelihood decreases

fpfn

numeric vector of length two with false positive and false negative rates for discrete data

Rho

perturbation matrix with dimensions nxl with n S-genes and l samples; either as probabilities with the sum of probabilities for a sample less or equal to 1 or discrete with 1s and 0s

ksel

character vector of methods for the inference of k; can combine "hc" (hierarchical clustering) or "kmeans" with "silhouette", "BIC" or "AIC"; can also include "cor" for correlation distance (preferred) instead of euclidean

Value

object of class mnem

comp

list of the component with each component being a list of the causal network phi and the E-gene attachment theta

data

input data matrix

limits

list of results for all indpendent searches

ll

log likelihood of the best model

lls

log likelihood ascent of the best model search

mw

vector with mixture weights

probs

kxl matrix containing the cell log likelihoods of the model

Author(s)

Martin Pirkl

Examples

1
2
3
4
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6))
data <- (sim$data - 0.5)/0.5
data <- data + rnorm(length(data), 0, 1)
result <- mnem(data, k = 2, starts = 1)

mnem documentation built on Nov. 18, 2020, 2 a.m.