fitGLMM: Perform differential abundance testing using a NB-generalised...

View source: R/glmm.R

fitGLMMR Documentation

Perform differential abundance testing using a NB-generalised linear mixed model

Description

This function will perform DA testing per-nhood using a negative binomial generalised linear mixed model

Usage

fitGLMM(
  X,
  Z,
  y,
  offsets,
  init.theta = NULL,
  Kin = NULL,
  random.levels = NULL,
  REML = FALSE,
  glmm.control = list(theta.tol = 1e-06, max.iter = 100, init.sigma = NULL, init.beta =
    NULL, init.u = NULL, solver = NULL),
  dispersion = 1,
  geno.only = FALSE,
  solver = NULL
)

Arguments

X

A matrix containing the fixed effects of the model.

Z

A matrix containing the random effects of the model.

y

A matrix containing the observed phenotype over each neighborhood.

offsets

A vector containing the (log) offsets to apply normalisation for different numbers of cells across samples.

init.theta

A column vector (m X 1 matrix) of initial estimates of fixed and random effect coefficients

Kin

A n x n covariance matrix to explicitly model variation between observations

random.levels

A list describing the random effects of the model, and for each, the different unique levels.

REML

A logical value denoting whether REML (Restricted Maximum Likelihood) should be run. Default is TRUE.

glmm.control

A list containing parameter values specifying the theta tolerance of the model, the maximum number of iterations to be run, initial parameter values for the fixed (init.beta) and random effects (init.u), and glmm solver (see details).

dispersion

A scalar value for the initial dispersion of the negative binomial.

geno.only

A logical value that flags the model to use either just the matrix 'Kin' or the supplied random effects.

solver

a character value that determines which optimisation algorithm is used for the variance components. Must be either HE (Haseman-Elston regression) or Fisher (Fisher scoring).

Details

This function runs a negative binomial generalised linear mixed effects model. If mixed effects are detected in testNhoods, this function is run to solve the model. The solver defaults to the Fisher optimiser, and in the case of negative variance estimates it will switch to the non-negative least squares (NNLS) Haseman-Elston solver. This behaviour can be pre-set by passing glmm.control$solver="HE" for Haseman-Elston regression, which is the recommended solver when a covariance matrix is provided, or glmm.control$solver="HE-NNLS" which is the constrained HE optimisation algorithm.

Value

A list containing the GLMM output, including inference results. The list elements are as follows:

FE:

numeric vector of fixed effect parameter estimates.

RE:

list of the same length as the number of random effect variables. Each slot contains the best linear unbiased predictors (BLUPs) for the levels of the corresponding RE variable.

Sigma:

numeric vector of variance component estimates, 1 per random effect variable.

converged:

logical scalar of whether the model has reached the convergence tolerance or not.

Iters:

numeric scalar with the number of iterations that the model ran for. Is strictly <= max.iter.

Dispersion:

numeric scalar of the dispersion estimate computed off-line

Hessian:

matrix of 2nd derivative elements from the fixed and random effect parameter inference.

SE:

matrix of standard error estimates, derived from the hessian, i.e. the square roots of the diagonal elements.

t:

numeric vector containing the compute t-score for each fixed effect variable.

COEFF:

matrix containing the coefficient matrix from the mixed model equations.

P:

matrix containing the elements of the REML projection matrix.

Vpartial:

list containing the partial derivatives of the (pseudo)variance matrix with respect to each variance component.

Ginv:

matrix of the inverse variance components broadcast to the full Z matrix.

Vsinv:

matrix of the inverse pseudovariance.

Winv:

matrix of the inverse elements of W = D^-1 V D^-1

VCOV:

matrix of the variance-covariance for all model fixed and random effect variable parameter estimates. This is required to compute the degrees of freedom for the fixed effect parameter inference.

DF:

numeric vector of the number of inferred degrees of freedom. For details see Satterthwaite_df.

PVALS:

numeric vector of the compute p-values from a t-distribution with the inferred number of degrees of freedom.

ERROR:

list containing Rcpp error messages - used for internal checking.

Author(s)

Mike Morgan

Examples

data(sim_nbglmm)
random.levels <- list("RE1"=paste("RE1", levels(as.factor(sim_nbglmm$RE1)), sep="_"),
                      "RE2"=paste("RE2", levels(as.factor(sim_nbglmm$RE2)), sep="_"))
X <- as.matrix(data.frame("Intercept"=rep(1, nrow(sim_nbglmm)), "FE2"=as.numeric(sim_nbglmm$FE2)))
Z <- as.matrix(data.frame("RE1"=paste("RE1", as.numeric(sim_nbglmm$RE1), sep="_"),
                          "RE2"=paste("RE2", as.numeric(sim_nbglmm$RE2), sep="_")))
y <- sim_nbglmm$Mean.Count
dispersion <- 0.5

glmm.control <- glmmControl.defaults()
glmm.control$theta.tol <- 1e-6
glmm.control$max.iter <- 15
model.list <- fitGLMM(X=X, Z=Z, y=y, offsets=rep(0, nrow(X)), random.levels=random.levels,
                      REML = TRUE, glmm.control=glmm.control, dispersion=dispersion, solver="Fisher")
model.list


MikeDMorgan/miloR documentation built on Feb. 29, 2024, 6:20 p.m.