glmermboost: Component-wise Gradient Boosting for Generalised Mixed Models

View source: R/mermboost_functions.R

glmermboostR Documentation

Component-wise Gradient Boosting for Generalised Mixed Models

Description

Gradient boosting for optimizing negative log-likelihoods as loss functions where component-wise linear models are utilized as base-learners and an estimation of random components is guaranteed via a maximum likelihood approach.

Usage

glmermboost(formula, data = list(), weights = NULL,
          offset = NULL, family = gaussian,
          na.action = na.omit, contrasts.arg = NULL,
          center = TRUE, control = boost_control(), oobweights = NULL, ...)

Arguments

formula

a symbolic description of the model to be fit in the lme4-format including random effects.

data

a data frame containing the variables in the model.

weights

an optional vector of weights to be used in the fitting process.

offset

a numeric vector to be used as offset (optional).

family

!! This is in contrast to usual mboost - "only" a family object is possible - except for NBinomial().

na.action

a function which indicates what should happen when the data contain NAs.

contrasts.arg

a list, whose entries are contrasts suitable for input to the contrasts replacement function and whose names are the names of columns of data containing factors. See model.matrix.default.

center

logical indicating of the predictor variables are centered before fitting.

control

a list of parameters controlling the algorithm. For more details see boost_control.

oobweights

an additional vector of out-of-bag weights, which is used for the out-of-bag risk (i.e., if boost_control(risk = "oobag")).

...

additional arguments passed to mboost_fit; currently none.

Details

The warning "model with centered covariates does not contain intercept" is correctly given - the intercept is estimated via the mixed model.

A (generalized) linear mixed model is fitted using a boosting algorithm based on component-wise univariate linear models. Additionally, a mixed model gets estimated in every iteration and added to the current fit. The fit, i.e., the regression coefficients and random effects, can be interpreted in the usual way. This particular methodology is described in Knieper et al. (2025).

Value

The description of glmboost holds while some methods are newly implemented like predict.mermboost, plot.mer_cv and mstop.mer_cv. Only the former one requires a further argument. Additionally, methods VarCorr.mermboost and ranef.mermboostare implemented specifically.

See Also

See mermboost for the same approach using additive models.

See mer_cvrisk for a cluster-sensitive cross-validation.

Examples

data(Orthodont)

# are there cluster-constant covariates?
find_ccc(Orthodont, "Subject")


# fit initial model
mod <- glmermboost(distance ~ age + Sex + (1 |Subject),
                   data = Orthodont, family = gaussian,
                   control = boost_control(mstop = 100))

# let mermboost do the cluster-sensitive cross-validation for you
norm_cv <- mer_cvrisk(mod, no_of_folds = 10)
opt_m <- mstop(norm_cv)

# fit model with optimal stopping iteration
mod_opt <- glmermboost(distance ~ age + Sex + (1 |Subject),
                   data = Orthodont, family = gaussian,
                   control = boost_control(mstop = opt_m))

# use the model as known from mboost
# in additional, there are some methods knwon from lme4
ranef(mod_opt)
VarCorr(mod_opt)


#######################

set.seed(123)

# Parameters
n_groups <- 10        # Number of groups
n_per_group <- 50     # Number of observations per group
beta_fixed <- c(0.5, -0.3, 0.7)  # Fixed effects for intercept, covariate1, covariate2
sigma_random <- 1     # Random effect standard deviation

# Simulate random effects (group-specific)
group_effects <- rnorm(n_groups, mean = 0, sd = sigma_random)

# Simulate covariates
covariate1 <- rnorm(n_groups * n_per_group)
covariate2 <- rnorm(n_groups * n_per_group)

# Simulate data
group <- rep(1:n_groups, each = n_per_group)
random_effect <- group_effects[group]

# Linear predictor including fixed effects and random effects
linear_predictor <- beta_fixed[1] + beta_fixed[2] * covariate1 +
                  beta_fixed[3] * covariate2 + random_effect
prob <- plogis(linear_predictor)  # Convert to probabilities

# Simulate binomial outcomes
y <- rbinom(n_groups * n_per_group, size = 1, prob = prob)

# Combine into a data frame
sim_data <- data.frame(group = group, y = y,
                       covariate1 = covariate1,
                       covariate2 = covariate2)
sim_data$group <- as.factor(sim_data$group)



mod3 <- glmermboost(y ~ covariate1 + covariate2 + (1 | group),
                    data = sim_data, family = binomial())
bin_cv <- mer_cvrisk(mod3, no_of_folds = 10)
mstop(bin_cv)


mermboost documentation built on April 4, 2025, 1:41 a.m.