fitModel: Fit (zero-inflated) negative binomial model to data

Description Usage Arguments Details Value Author(s) See Also

Description

This function fits a ZINB or NB model with variable input. It is the wrapper for the individual fits of the full and alternative models in LineagePulse.

Usage

1
2
3
4
5
6
7
8
fitModel(matCounts, dfAnnotation, vecConfoundersMu = NULL,
  vecConfoundersDisp = NULL, vecNormConst, scaDFSplinesMu = NULL,
  scaDFSplinesDisp = NULL, matWeights = NULL, matPiConstPredictors = NULL,
  lsDropModel = NULL, matMuModelInit = NULL, lsmatBatchModelInitMu = NULL,
  matDispModelInit = NULL, lsmatBatchModelInitDisp = NULL, strMuModel,
  strDispModel, strDropModel = "logistic_ofMu", strDropFitGroup = "PerCell",
  scaMaxEstimationCycles = 20, boolVerbose = TRUE,
  boolSuperVerbose = TRUE)

Arguments

matCounts

(matrix genes x cells) Count data of all cells, unobserved entries are NA.

dfAnnotation

(data frame cells x meta characteristics) Annotation table which contains meta data on cells.

vecConfoundersMu

(vector of strings number of confounders on mean) [Default NULL] Confounders to correct for in mu batch correction model, must be subset of column names of dfAnnotation which describe condounding variables.

vecConfoundersDisp

(vector of strings number of confounders on dispersion) [Default NULL] Confounders to correct for in dispersion batch correction model, must be subset of column names of dfAnnotation which describe condounding variables.

vecNormConst

(numeric vector number of cells) Model scaling factors, one per cell. These factors linearly scale the mean model for evaluation of the loglikelihood.

scaDFSplinesMu

(sca) [Default NULL] If strMuModel=="splines", the degrees of freedom of the natural cubic spline to be used as a mean parameter model.

scaDFSplinesDisp

(sca) [Default NULL] If strDispModelFull=="splines" or strDispModelRed=="splines", the degrees of freedom of the natural cubic spline to be used as a dispersion parameter model.

matWeights

(numeric matrix cells x mixtures) [Default NULL] Assignments of cells to mixtures (for strMuModel="MM").

matPiConstPredictors

(numeric matrix genes x number of constant gene-wise drop-out predictors) [Default NULL] Predictors for logistic drop-out fit other than offset and mean parameter (i.e. parameters which are constant for all observations in a gene and externally supplied.) Is null if no constant predictors are supplied.

lsDropModel

(list) [Default NULL] Object containing description of cell-wise drop-out parameter models.

matMuModelInit

(numeric matrix genes x mu model parameters) [Default NULL] Contains initialisation of mean model parameters according to the used model.

lsmatBatchModelInitMu

(list) [Default NULL] Initialisation of batch correction models for mean parameter.

matDispModelInit

(numeric matrix genes x disp model parameters) [Default NULL] Contains initialisation of dispersion model parameters according to the used model.

lsmatBatchModelInitDisp

(list) [Default NULL] Initialisation of batch correction models for dispersion parameter.

strMuModel

(str) "constant", "groups", "MM", "splines","impulse" [Default "impulse"] Model according to which the mean parameter is fit to each gene as a function of population structure in the alternative model (H1).

strDispModel

(str) "constant", "groups", "splines" [Default "constant"] Model according to which dispersion parameter is fit to each gene as a function of population structure in the given model.

strDropModel

(str) "logistic_ofMu", "logistic", "none" [Default "logistic_ofMu"] Definition of drop-out model. "logistic_ofMu" - include the fitted mean in the linear model of the drop-out rate and use offset and matPiConstPredictors. "logistic" - only use offset and matPiConstPredictors. "none" - negative binomial noise model without zero-inflation.

strDropFitGroup

(str) "PerCell", "AllCells" [Defaul "PerCell"] Definition of groups on cells on which separate drop-out model parameterisations are fit. "PerCell" - one parametersiation (fit) per cell "ForAllCells" - one parametersiation (fit) for all cells

scaMaxEstimationCycles

(integer) [Default 20] Maximum number of estimation cycles performed in fitZINB(). One cycle contain one estimation of of each parameter of the zero-inflated negative binomial model as coordinate ascent.

boolVerbose

(bool) [Default TRUE] Whether to follow convergence of the iterative parameter estimation with one report per cycle.

boolSuperVerbose

(bool) [Default TRUE] Whether to follow convergence of the iterative parameter estimation in high detail with local convergence flags and step-by-step loglikelihood computation.

Details

For ZINB models with drop-out model estimation: The estimation is iterative coordinate ascent over gene-wise and cell-wise model if the drop-out model is not set a priori. If the drop-out model is given, the estimation is a single M-like step of the iterative coordinate ascent.

Convergence of iterative coordinate ascent is tracked with the the loglikelihood of the entire data matrix. Every step is a maximum likelihood estimation of the target parameters conditioned on the remaining parameter estimates. Therefore, convergence to a local optimum is guaranteed if the algorithm is run until convergence. Parallelisation of each estimation step is implemented where conditional independences of parameter estimations allow so.

Convergence can be followed with verbose=TRUE (at each iteration) or at each step (boolSuperVerbose=TRUE).

To save memory, not the entire parameter matrix (genes x cells) but the parmater models are stored in the objects lsMuModel, lsDispModel and lsDropModel. In short, these object contain the gene/cell-wise parameters of the model used to constrain the parameter in question and the predictors necessary to evaluate the parameter model to receive the observation-wise paramter values.

Value

list

Author(s)

David Sebastian Fischer

See Also

Called by fitContinuousModels. Calls parameter estimation wrappers: fitPiZINB, fitZINBMuDisp. Calls evalLogLikMatrix to follow convergence.


YosefLab/LineagePulse documentation built on May 6, 2019, 2:19 p.m.