lmm: Fitting Linear Mixed-effects Models

View source: R/lmm.R

lmmR Documentation

Fitting Linear Mixed-effects Models

Description

lmm is used to fit a linear mixed-effects model (LMM) based on summary-level data. The LMM parameters are estimated by either restricted maximum likelihood (REML) or maximum likelihood (ML) method with Fisher scoring (FS) gradient descent algorithm.

Usage

lmm(
  XX,
  XY,
  ZX,
  ZY,
  ZZ,
  Ynorm,
  n,
  d = ncol(ZZ),
  method = c("REML", "ML"),
  theta0 = NULL,
  max.iter = 50,
  epsilon = 1e-05,
  output.cov = TRUE,
  output.FIM = FALSE,
  output.RE = FALSE,
  output.SS = FALSE,
  verbose = FALSE
)

Arguments

XX

= t(X)%*%X, where X is a design matrix for fixed effects.

XY

= t(Y%*%X), where Y is a features-by-samples matrix of observed responses (genes-by-cells expression matrix for scRNA-seq).

ZX

= t(Z)%*%X, where Z = [Z1, ..., Zk] is a design matrix for k random components (factors).

ZY

= t(Y%*%Z).

ZZ

= t(Z)%*%Z.

Ynorm

= rowSums(Y*Y), norms for features (each row).

n

= nrow(X), number of samples (cells in scRNA-seq).

d

= (d1,...,dk), where di = ncol(Zi), the number of columns in Zi. sum(d) = ncol(Z), the number of columns in Z. For the model with only one random component, d = ncol(Z).

method

a character string, either "REML" or "ML". Defaults to "REML". If ‘REML’, the model is fit using REML; otherwise, using ML.

theta0

a vector of initial values of the variance components, (s1, ...,sk, s_(k+1)), si = sigma_i^2, the i-th variance component. s_(k+1) = sigma^2, the variance component of model residual error.

max.iter

the maximal number of iterations for the iterative algorithm.

epsilon

positive convergence tolerance. If the absolute value of the first partial derivative of log likelihood is less than epsilon, the iterations converge.

output.cov

a logical scalar. If TRUE, output an array of the covariance matrices for the estimated coefficients, cov.beta.

output.FIM

a logical scalar. If TRUE, output an array of the Fisher information matrices (FIM) for fixed effects, FIM.beta[,,i] = X^T(Cov(Y[i,])^{-1}X for the i-th feature, and variance components, FIM.theta[,,i] = Var(\partial(logL_i)/\partial(theta)) = - E[\partial^2(logL_i)/{\partial(theta)\partial(theta^T)}] for the i-th feature.

output.RE

a logical scalar. If TRUE, output the best linear unbiased prediction (BLUP) of the random effects.

output.SS

a logical scalar. If TURE, output list(XX, XY, ZX, ZY, ZZ, Ynorm, n), a list of the summary-level data (summary statistics), which can also be computed using the sslmm function.

verbose

a logical scalar. If TRUE, print the number of features for which LMM fitting did not converge (abs(dlogL) > epsilon).

Value

A list containing the following components:

method

the method, either REML or ML, for estimating the LMM parameters (fixed effects and variance components).

epsilon

convergence tolerance.

dlogL

first partial derivatives of log-likelihoods for each feature.

logLik

maximum log-likelihoods for ML method or maximum log-restricted-likelihood for REML method.

niter

numbers of iterations for each feature.

coef

a matrix of estimated coefficients (fixed effects or beta), each column corresponds to a feature and each row one covariate.

se

a matrix of standard errors of the estimated coefficients.

t

a matrix of t-values for the fixed effects, equal to coef/se.

p

a matrix of two-sided p-values for the t-tests of the fixed effects.

df

degrees of freedom for the t-statistics (values).

cov

a array of covariance matrices of the estimated coefficients (fixed effects).

theta

a matrix of the estimated variance components, each column corresponds to a feature and each row one variance component. The last row is the variance component of the residual error.

se.theta

standard errors of the estimated theta.

FIM.beta

the Fisher information matrices (FIM) for fixed effects.

FIM.theta

the Fisher information matrices (FIM) for variance components.

RE

a matrix of the best linear unbiased prediction (BLUP) of random effects.

summary_data

a list of the summary-level data (summary statistics).

Examples

#Generate data: X, Y, and Z.
set.seed(2024)

n <- 1e3
m <- 10
Y <- matrix(rnorm(m*n), m, n)
rownames(Y) <- paste0("Gene", 1:nrow(Y))

trt <- sample(c("A", "B"), n, replace = TRUE)
X <- model.matrix(~ 0 + trt)

q <- 20
sam <- rep(NA, n)
sam[trt == "A"] <- paste0("A", sample.int(q/2, sum(trt == "A"), replace = TRUE))
sam[trt == "B"] <- paste0("B", sample.int(q/2, sum(trt == "B"), replace = TRUE))
Z <- model.matrix(~ 0 + sam)
d <- ncol(Z)

#Fit LMM by summary-level data
#Compute and store the summary-level data:
n <- nrow(X)
XX <- t(X)%*%X
XY <- t(Y%*%X)
ZX <- t(Z)%*%X
ZY <- t(Y%*%Z)
ZZ <- t(Z)%*%Z
Ynorm <- rowSums(Y*Y)
fit <- lmm(XX, XY, ZX, ZY, ZZ, Ynorm = Ynorm, n = n, d = d)
str(fit)


FLASHMM documentation built on April 8, 2026, 1:06 a.m.