MEGB | R Documentation |
MEGB is an adaptation of the gradient boosting regression method to longitudinal data similar to the Mixed Effect Random Forest (MERF) developed by Hajjem et. al. (2014) <doi:10.1080/00949655.2012.741599> which was implemented by Capitaine et. al. (2020) <doi:10.1177/0962280220946080>. The algorithm estimates the parameters of a semi-parametric mixed-effects model:
Y_i(t)=f(X_i(t))+Z_i(t)\beta_i+\epsilon_i
with Y_i(t)
the output at time t
for the i
th individual; X_i(t)
the input predictors (fixed effects) at time t
for the i
th individual;
Z_i(t)
are the random effects at time t
for the i
th individual;
\epsilon_i
is the residual error.
MEGB(
X,
Y,
id,
Z,
iter = 100,
ntree = 500,
time,
shrinkage = 0.05,
interaction.depth = 1,
n.minobsinnode = 5,
cv.folds = 0,
delta = 0.001,
verbose = TRUE
)
X |
[matrix]: A |
Y |
[vector]: A vector containing the output trajectories. |
id |
[vector]: Is the vector of the identifiers for the different trajectories. |
Z |
[matrix]: A |
iter |
[numeric]: Maximal number of iterations of the algorithm. The default is set to |
ntree |
[numeric]: Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. The default value is |
time |
[vector]: Is the vector of the measurement times associated with the trajectories in |
shrinkage |
[numeric]: a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction. The default value is set to 0.05. |
interaction.depth |
[numeric]: The maximum depth of variable interactions: 1 builds an additive model, 2 builds a model with up to two-way interactions, etc. The default value is set to 1. |
n.minobsinnode |
[numeric]: minimum number of observations (not total weights) in the terminal nodes of the trees. The default value is set to 5. |
cv.folds |
[numeric]: Number of cross-validation folds to perform. If cv.folds>1 then gbm, in addition to the usual fit, will perform a cross-validation and calculate an estimate of generalization error returned in cv_error. The default value is set to 0. |
delta |
[numeric]: The algorithm stops when the difference in log likelihood between two iterations is smaller than |
verbose |
[boolean]: If TRUE, MEGB will print out number of iterations to achieve convergence. Default is TRUE. |
A fitted MEGB model which is a list of the following elements:
forest:
GBMFit obtained at the last iteration.
random_effects :
Predictions of random effects for different trajectories.
id_btilde:
Identifiers of individuals associated with the predictions random_effects
.
var_random_effects:
Estimation of the variance covariance matrix of random effects.
sigma:
Estimation of the residual variance parameter.
time:
The vector of the measurement times associated with the trajectories in Y
,Z
and X
.
LL:
Log-likelihood of the different iterations.
id:
Vector of the identifiers for the different trajectories.
OOB:
OOB error of the fitted random forest at each iteration.
set.seed(1)
data <-simLong(n = 20,p = 6,rel_p = 6,time_points = 10,rho_W = 0.6, rho_Z=0.6,
random_sd_intercept = sqrt(0.5),
random_sd_slope = sqrt(3),
noise_sd = 0.5,linear=TRUE) # Generate the data composed by n=20 individuals.
# Train a MEGB model on the generated data. Should take ~ 7 seconds
megb <- MEGB(X=as.matrix(data[,-1:-5]),Y=as.matrix(data$Y),
Z=as.matrix(data[,4:5]),id=data$id,time=data$time,ntree=500,cv.folds=0,verbose=TRUE)
megb$forest # is the fitted gradient boosting (GBMFit) (obtained at the last iteration).
megb$random_effects # are the predicted random effects for each individual.
plot(megb$LL,type="o",col=2) # evolution of the log-likelihood.
megb$OOB # OOB error at each iteration.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.