gammi: Fit a Generalized Additive Mixed Model

View source: R/gammi.R

gammiR Documentation

Fit a Generalized Additive Mixed Model

Description

Fits generalized additive models (GAMs) and generalized additive mixed model (GAMMs) using lme4 as the tuning engine. Predictor groups can be manually input (default S3 method) or inferred from the model (S3 "formula" method). Smoothing parameters are treated as variance components and estimated using REML/ML (gaussian) or Laplace approximation to ML (others).

Usage

gammi(x, ...)

## Default S3 method:
gammi(x,
      y,
      group,
      family = gaussian,
      fixed = NULL,
      random = NULL,
      data = NULL,
      REML = TRUE,
      control = NULL,
      start = NULL,
      verbose = 0L,
      nAGQ = 10L,
      subset, 
      weights, 
      na.action, 
      offset, 
      mustart, 
      etastart,
      ...)

## S3 method for class 'formula'
gammi(formula, 
      data, 
      family = gaussian,
      fixed = NULL, 
      random = NULL, 
      REML = TRUE,
      control = NULL,
      start = NULL,
      verbose = 0L,
      nAGQ = 10L,
      subset, 
      weights, 
      na.action, 
      offset, 
      mustart, 
      etastart,
      ...)

Arguments

x

Model (design) matrix of dimension nobs by nvars (n \times p).

y

Response vector of length n.

group

Group label vector (factor, character, or integer) of length p. Predictors with the same label are assumed to have the same variance parameter.

formula

Model formula: a symbolic description of the model to be fitted. Uses the same syntax as lm and glm.

family

Assumed exponential family (and link function) for the response variable.

fixed

For default method: a character vector specifying which group labels should be treated as fixed effects. For formula method: a one-sided formula specifying the fixed effects model structure.

random

A one-sided formula specifying the random effects structure using lme4 syntax. See Note.

data

Optional data frame containing the variables referenced in formula, fixed, and/or random.

REML

Logical indicating whether REML versus ML should be used to tune the smoothing parameters and variance components.

control

List containing the control parameters (output from lmerControl or glmerControl).

start

List (with names) of starting parameter values for model parameters.

verbose

Postive integer that controls the level of output displayed during optimization.

nAGQ

Numer of adaptive Gaussian quadrature points. Only used for non-Gaussian responses with a single variance component.

subset

Optional expression indicating the subset of rows to use for the fitting (defaults to all rows).

weights

Optional vector indicating prior observations weights for the fitting (defaults to all ones).

na.action

Function that indicates how NA data should be dealt with. Default (of na.omit) will omit any observations with missing data on any variable.

offset

Optional vector indicating each observation's offset for the fitting (defaults to all zeros).

mustart

Optional starting values for the mean (fitted values).

etastart

Optional starting values for the linear predictors.

...

Optional arguments passed to the spline.model.matrix function, e.g., spline knots or df for each term.

Details

Fits a generalized additive mixed model (GAMM) of the form

g(\mu) = f(\mathbf{X}, \mathbf{Z}) + \mathbf{X}^\top \boldsymbol\beta + \mathbf{Z}^\top \boldsymbol\alpha

where

  • \mu = E(Y | \mathbf{X}, \mathbf{Z}) is the conditional expectation of the response Y given the predictor vectors \mathbf{X} = (X_1, \ldots, X_p)^\top and \mathbf{Z} = (Z_1, \ldots, Z_q)^\top

  • the function g(\cdot) is a user-specified (invertible) link function

  • the function f(\cdot) is an unknown smooth function of the predictors (specified by formula)

  • the vector \mathbf{X} is the fixed effects component of the design (specified by fixed)

  • the vector \mathbf{Z} is the random effects component of the design (specified by random)

  • the vector \boldsymbol\beta contains the unknown fixed effects coefficients

  • the vector \boldsymbol\alpha contains the unknown Gaussian random effects

Note that the mean function f(\cdot) can include main and/or interaction effects between any number of predictors. Furthermore, note that the fixed effects in \mathbf{X}^\top \boldsymbol\beta and the random effects in \mathbf{Z}^\top \boldsymbol\alpha are both optional.

Value

An object of class "gammi" with the following elements:

fitted.values

model predictions on the data scale

linear.predictors

model predictions on the link scale

coefficients

coefficients used to make the predictions

random.coefficients

coefficients corresponding to the random argument, i.e., the BLUPs.

term.labels

labels for the terms included in the coefficients

dispersion

estimated dispersion parameter = deviance/df.residual when is.null(random)

vcovchol

Cholesky factor of covariance matrix such that tcrossprod(vcovchol) gives the covariance matrix for the combined coefficient vector c(coefficients, random.coefficients)

family

exponential family distribution (same as input)

logLik

log-likelihood for the solution

aic

AIC for the solution

deviance

model deviance, i.e., two times the negative log-likelihood

null.deviance

deviance of the null model, i.e., intercept only. Will be NA if the random argument is used.

r.squared

proportion of null deviance explained = 1 - deviance/null.deviance. Will be NA if the random argument is used; see Note.

nobs

number of observations used in fit

leverages

leverage scores for each observation

edf

effective degrees of freedom = sum(leverages)

df.random

degress of freedom corresponding to random formula, i.e., number of co/variance parameters

df.residual

residual degrees of freedom = nobs - edf

x

input x matrix (default method only)

group

character vector indicating which columns of x belong to which model terms

scale

numeric vector giving the scale parameter used to z-score each term's data

fixed

fixed effects terms (default method) or formula (formula method); will be NULL if no fixed terms are included

random

random effects formula

mer

object of class "merMod", such as output by lmer, with model fit information on a standardized scale

VarCorr

data frame with variance and covariance parameter estimates from mer transformed back to the original scale

call

function call

data

input data

contrasts

list of contrasts applied to fixed terms; will be NULL if no fixed terms are included

spline.info

list of spline parameters for terms in x or formula

formula

input model formula

Random Syntax

The random argument uses standard lmer syntax:

  • (1 | g) for a random intercept for each level of g

  • (1 | g1) + (1 | g2) for random intercepts for g1 and g2

  • (1 | g1/g2) = (1 | g1) + (1 | g1:g2) for random intercepts for g1 and g2 nested within g1

  • (x | g) = (1 + x | g) for a correlated random intercept and slope of x for each level of g

  • (x || g) = (1 | g) + (0 + x | g) for an uncorrelated random intercept and slope of x for each level of g

Warning

For stable computation, any terms entered through x (default method) or formula and/or fixed (formula method) are z-scored prior to fitting the model. Note that terms entered through random are not standardized.

The "mer" component of the output contains the model fitting results for a z-scored version of the original data (i.e., this fit is on a different scale). Consequently, the "mer" component should not be used for prediction and/or inference purposes. All prediction and inference should be conducted using the plot, predict, and summary methods mentioned in the ‘See Also’ section.

The "VarCorr" component contains the estimated variance/covariance parameters transformed back to the original scale.

Note

The model R-squared is the proportion of the null deviance that is explained by the model, i.e.,

r.squared = 1 - deviance / null.deviance

where deviance is the deviance of the model, and null.deviance is the deviance of the null model.

When the random argument is used, null.deviance and r.squared will be NA. This is because there is not an obvious null model when random effects are included, e.g., should the null model include or exclude the random effects? Assuming that is it possible to define a reasonable null.deviance in such cases, the above formula can be applied to calculate the model R-squared for models that contain random effects.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v067.i01")}

Helwig, N. E. (2024). Precise tensor product smoothing via spectral splines. Stats, 7(1), 34-53, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3390/stats7010003")}

See Also

plot.gammi for plotting effects from gammi objects

predict.gammi for predicting from gammi objects

summary.gammi for summarizing results from gammi objects

Examples

##############***##############   EXAM EXAMPLE   ##############***##############

# load 'gammi' package
library(gammi)

# load 'exam' help file
?exam

# load data
data(exam)

# header of data
head(exam)

# fit model
mod <- gammi(Exam.score ~ VRQ.score, data = exam,
             random = ~ (1 | Primary.school) + (1 | Secondary.school))
             
# plot results
plot(mod)

# summarize results
summary(mod)





#############***#############   GAUSSIAN EXAMPLE   #############***#############

#~~~Example 1:  Single Predictor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# generate data
n <- 1000
x <- seq(0, 1, length.out = n)
fx <- sin(2 * pi * x)
set.seed(1)
y <- fx + rnorm(n)

# fit model via formula method
mod <- gammi(y ~ x)
mod

# fit model via default method
modmat <- spline.model.matrix(y ~ 0 + x)
tlabels <- attr(modmat, "term.labels")
tassign <- attr(modmat, "assign")
g <- factor(tlabels[tassign], levels = tlabels)
mod0 <- gammi(modmat, y, g)
mod0

# summarize fit model
summary(mod)

# plot function estimate
plot(mod)



#~~~Example 2:  Additive Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# mean function
eta <- function(x, z, additive = TRUE){
  mx1 <- cos(2 * pi * (x - pi))
  mx2 <- 30 * (z - 0.6)^5
  mx12 <- 0
  if(!additive) mx12 <- sin(pi * (x - z))
  mx1 + mx2 + mx12
}

# generate data
set.seed(1)
n <- 1000
x <- runif(n)
z <- runif(n)
fx <- eta(x, z)
y <- fx + rnorm(n)

# fit model via formula method
mod <- gammi(y ~ x + z)
mod

# fit model via default method
modmat <- spline.model.matrix(y ~ 0 + x + z)
tlabels <- attr(modmat, "term.labels")
tassign <- attr(modmat, "assign")
g <- factor(tlabels[tassign], levels = tlabels)
mod0 <- gammi(modmat, y, g)
mod0

# summarize fit model
summary(mod)

# plot function estimate
plot(mod)



#~~~Example 3:  Interaction Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# mean function
eta <- function(x, z, additive = TRUE){
  mx1 <- cos(2 * pi * (x - pi))
  mx2 <- 30 * (z - 0.6)^5
  mx12 <- 0
  if(!additive) mx12 <- sin(pi * (x - z))
  mx1 + mx2 + mx12
}

# generate data
set.seed(1)
n <- 1000
x <- runif(n)
z <- runif(n)
fx <- eta(x, z, additive = FALSE)
y <- fx + rnorm(n)

# fit model via formula method
mod <- gammi(y ~ x * z)
mod

# fit model via default method
modmat <- spline.model.matrix(y ~ 0 + x * z)
tlabels <- attr(modmat, "term.labels")
tassign <- attr(modmat, "assign")
g <- factor(tlabels[tassign], levels = tlabels)
mod0 <- gammi(modmat, y, g)
mod0

# summarize fit model
summary(mod)

# plot function estimate
plot(mod)



#~~~Example 4:  Random Intercept ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# mean function
eta <- function(x, z, additive = TRUE){
  mx1 <- cos(2 * pi * (x - pi))
  mx2 <- 30 * (z - 0.6)^5
  mx12 <- 0
  if(!additive) mx12 <- sin(pi * (x - z))
  mx1 + mx2 + mx12
}

# generate mean function
set.seed(1)
n <- 1000
nsub <- 50
x <- runif(n)
z <- runif(n)
fx <- eta(x, z)

# generate random intercepts
subid <- factor(rep(paste0("sub", 1:nsub), n / nsub),
                levels = paste0("sub", 1:nsub))
u <- rnorm(nsub, sd = sqrt(1/2))

# generate responses
y <- fx + u[subid] + rnorm(n, sd = sqrt(1/2))

# fit model via formula method
mod <- gammi(y ~ x + z, random = ~ (1 | subid))
mod

# fit model via default method
modmat <- spline.model.matrix(y ~ 0 + x + z)
tlabels <- attr(modmat, "term.labels")
tassign <- attr(modmat, "assign")
g <- factor(tlabels[tassign], levels = tlabels)
mod0 <- gammi(modmat, y, g, random = ~ (1 | subid))
mod0

# summarize fit model
summary(mod)

# plot function estimate
plot(mod)




#############***#############   BINOMIAL EXAMPLE   #############***#############

#~~~Example 1:  Single Predictor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# generate data
n <- 1000
x <- seq(0, 1, length.out = n)
fx <- sin(2 * pi * x)
set.seed(1)
y <- rbinom(n = n, size = 1, prob = 1 / (1 + exp(-fx)))

# fit model
mod <- gammi(y ~ x, family = binomial)
mod

# summarize fit model
summary(mod)

# plot function estimate
plot(mod)



#~~~Example 2:  Additive Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# mean function
eta <- function(x, z, additive = TRUE){
  mx1 <- cos(2 * pi * (x - pi))
  mx2 <- 30 * (z - 0.6)^5
  mx12 <- 0
  if(!additive) mx12 <- sin(pi * (x - z))
  mx1 + mx2 + mx12
}

# generate data
set.seed(1)
n <- 1000
x <- runif(n)
z <- runif(n)
fx <- 1 + eta(x, z)
y <- rbinom(n = n, size = 1, prob = 1 / (1 + exp(-fx)))

# fit model
mod <- gammi(y ~ x + z, family = binomial)
mod

# summarize fit model
summary(mod)

# plot function estimate
plot(mod)



#~~~Example 3:  Interaction Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# mean function
eta <- function(x, z, additive = TRUE){
  mx1 <- cos(2 * pi * (x - pi))
  mx2 <- 30 * (z - 0.6)^5
  mx12 <- 0
  if(!additive) mx12 <- sin(pi * (x - z))
  mx1 + mx2 + mx12
}

# generate data
set.seed(1)
n <- 1000
x <- runif(n)
z <- runif(n)
fx <- eta(x, z, additive = FALSE)
y <- rbinom(n = n, size = 1, prob = 1 / (1 + exp(-fx)))

# fit model
mod <- gammi(y ~ x * z, family = binomial)
mod

# summarize fit model
summary(mod)

# plot function estimate
plot(mod)



#~~~Example 4:  Random Intercept ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# mean function
eta <- function(x, z, additive = TRUE){
  mx1 <- cos(2 * pi * (x - pi))
  mx2 <- 30 * (z - 0.6)^5
  mx12 <- 0
  if(!additive) mx12 <- sin(pi * (x - z))
  mx1 + mx2 + mx12
}

# generate mean function
set.seed(1)
n <- 1000
nsub <- 50
x <- runif(n)
z <- runif(n)
fx <- 1 + eta(x, z)

# generate random intercepts
subid <- factor(rep(paste0("sub", 1:nsub), n / nsub),
                levels = paste0("sub", 1:nsub))
u <- rnorm(nsub, sd = sqrt(1/2))

# generate responses
y <- rbinom(n = n, size = 1, prob = 1 / (1 + exp(-(fx+u[subid]))))

# fit model
mod <- gammi(y ~ x + z, random = ~ (1 | subid), family = binomial)
mod

# summarize fit model
summary(mod)

# plot function estimate
plot(mod)



gammi documentation built on April 4, 2025, 4:48 a.m.

Related to gammi in gammi...