MEGB: Mixed Effect Gradient Boosting (MEGB) Algorithm

MEGBR Documentation

Mixed Effect Gradient Boosting (MEGB) Algorithm

Description

MEGB is an adaptation of the gradient boosting regression method to longitudinal data similar to the Mixed Effect Random Forest (MERF) developed by Hajjem et. al. (2014) <doi:10.1080/00949655.2012.741599> which was implemented by Capitaine et. al. (2020) <doi:10.1177/0962280220946080>. The algorithm estimates the parameters of a semi-parametric mixed-effects model:

Y_i(t)=f(X_i(t))+Z_i(t)\beta_i+\epsilon_i

with Y_i(t) the output at time t for the ith individual; X_i(t) the input predictors (fixed effects) at time t for the ith individual; Z_i(t) are the random effects at time t for the ith individual; \epsilon_i is the residual error.

Usage

MEGB(
  X,
  Y,
  id,
  Z,
  iter = 100,
  ntree = 500,
  time,
  shrinkage = 0.05,
  interaction.depth = 1,
  n.minobsinnode = 5,
  cv.folds = 0,
  delta = 0.001,
  verbose = TRUE
)

Arguments

X

[matrix]: A N x p matrix containing the p predictors of the fixed effects, column codes for a predictor.

Y

[vector]: A vector containing the output trajectories.

id

[vector]: Is the vector of the identifiers for the different trajectories.

Z

[matrix]: A N x q matrix containing the q predictor of the random effects.

iter

[numeric]: Maximal number of iterations of the algorithm. The default is set to iter=100

ntree

[numeric]: Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. The default value is ntree=500.

time

[vector]: Is the vector of the measurement times associated with the trajectories in Y,Z and X.

shrinkage

[numeric]: a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction. The default value is set to 0.05.

interaction.depth

[numeric]: The maximum depth of variable interactions: 1 builds an additive model, 2 builds a model with up to two-way interactions, etc. The default value is set to 1.

n.minobsinnode

[numeric]: minimum number of observations (not total weights) in the terminal nodes of the trees. The default value is set to 5.

cv.folds

[numeric]: Number of cross-validation folds to perform. If cv.folds>1 then gbm, in addition to the usual fit, will perform a cross-validation and calculate an estimate of generalization error returned in cv_error. The default value is set to 0.

delta

[numeric]: The algorithm stops when the difference in log likelihood between two iterations is smaller than delta. The default value is set to 0.001

verbose

[boolean]: If TRUE, MEGB will print out number of iterations to achieve convergence. Default is TRUE.

Value

A fitted MEGB model which is a list of the following elements:

  • forest: GBMFit obtained at the last iteration.

  • random_effects : Predictions of random effects for different trajectories.

  • id_btilde: Identifiers of individuals associated with the predictions random_effects.

  • var_random_effects: Estimation of the variance covariance matrix of random effects.

  • sigma: Estimation of the residual variance parameter.

  • time: The vector of the measurement times associated with the trajectories in Y,Z and X.

  • LL: Log-likelihood of the different iterations.

  • id: Vector of the identifiers for the different trajectories.

  • OOB: OOB error of the fitted random forest at each iteration.

Examples

set.seed(1)
data <-simLong(n = 20,p = 6,rel_p = 6,time_points = 10,rho_W = 0.6, rho_Z=0.6,
              random_sd_intercept = sqrt(0.5),
              random_sd_slope = sqrt(3),
              noise_sd = 0.5,linear=TRUE)  # Generate the data composed by n=20 individuals.
# Train a MEGB model on the generated data. Should take ~ 7 seconds
megb <-   MEGB(X=as.matrix(data[,-1:-5]),Y=as.matrix(data$Y),
Z=as.matrix(data[,4:5]),id=data$id,time=data$time,ntree=500,cv.folds=0,verbose=TRUE)
megb$forest # is the fitted gradient boosting (GBMFit) (obtained at the last iteration).
megb$random_effects # are the predicted random effects for each individual.
plot(megb$LL,type="o",col=2) # evolution of the log-likelihood.
megb$OOB # OOB error at each iteration.



MEGB documentation built on April 4, 2025, 2:59 a.m.

Related to MEGB in MEGB...