cv.glmertree: Cross Validation of (Generalized) Linear Mixed Model Trees
In glmertree: Generalized Linear Mixed Model Trees

View source: R/cv.glmertree.R

cv.glmertree

R Documentation

Cross Validation of (Generalized) Linear Mixed Model Trees

Description

Performs cross-validation of a model-based recursive partition based on (generalized) linear mixed models. Using the tree or subgroup structure estimated from a training dataset, the full mixed-effects model parameters are re-estimated using a new set of test observations, providing valid computation of standard errors and valid inference. The approach is inspired by Athey & Imbens (2016), and "enables the construction of valid confidence intervals [...] whereby one sample is used to construct the partition and another to estimate [...] effects for each subpopulation."

Usage

cv.lmertree(tree, newdata, reference = NULL, omit.intercept = FALSE, ...)

cv.glmertree(tree, newdata, reference = NULL, omit.intercept = FALSE, ...)

Arguments

`tree`	An object of class `lmertree` or `glmertree` that was fitted on a set of training data.
`newdata`	A `data.frame` containing a new set of observations on the same variables that were used to fit `tree`.
`reference`	Numeric or character scalar, indicating the number of the terminal node of which the intercept should be taken as a reference for intercepts in all other nodes. If `NULL`, the default of taking the first terminal node's intercept as the reference category will be used. If the interest is in testing significance of differences between the different nodes intercepts, this can be overruled by specifying the number of the terminal node that should be used as the reference category.
`omit.intercept`	Logical scalar, indicating whether the intercept should be omitted from the model. The default (`FALSE`) includes the intercept of the first terminal node as the intercept and allows for significance testing of the differences between the first and the other terminal node's intercepts. Specifying `TRUE` will test the value of each terminal node's intercept against zero.
`...`	Not currently used.

Details

The approach is inspired by Athey & Imbens (2016), and "enables the construction of valid confidence intervals [...] whereby one sample is used to construct the partition and another to estimate [...] effects for each subpopulation."

Value

An object of with classes lmertree and cv.lmertree, or glmertree and cv.glmertree. It is the original (g)lmertree specified by argument tree, but the parametric model model estimated based on the data specified by argument newdata. The default S3 methods for classes lmertree and glmertree can be used to inspect the results: plot, predict, coef, fixef, ranef and VarCorr. In addition, there is a dedicated summary method for classes cv.lmertree and cv.glmertree, which prints valid parameter estimates and standard errors, resulting from summary.merMod. For objects of clas cv.lmertree, hypothesis tests (i.e., p-values) can be obtained by loading package lmerTest PRIOR to loading package(s) glmertree (and lme4), see examples.

References

Athey S, Imbens G (2016). “Recursive Partitioning for Heterogeneous Causal Effects.” Proceedings of the National Academy of Sciences, 113(27), 7353–7360. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1073/pnas.1510489113")}

Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”. Behavior Research Methods, 50(5), 2016–2034. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3758/s13428-017-0971-x")}

Fokkema M, Edbrooke-Childs J, Wolpert M (2021). “Generalized Linear Mixed-Model (GLMM) Trees: A Flexible Decision-Tree Method for Multilevel and Longitudinal Data.” Psychotherapy Research, 31(3), 329–341. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10503307.2020.1785037")}

Fokkema M, Zeileis A (2024). “Subgroup Detection in Linear Growth Curve Models with Generalized Linear Mixed Model (GLMM) Trees.” Behavior Research Methods, 56(7), 6759–6780. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3758/s13428-024-02389-1")}

Examples



require("lmerTest") ## load BEFORE lme4 and glmertree to obtain hypothesis tests / p-values

## Create artificial training and test datasets
set.seed(42)
train <- sample(1:nrow(DepressionDemo), size = 200, replace = TRUE)
test <- sample(1:nrow(DepressionDemo), size = 200, replace = TRUE)

## Fit tree on training data
tree1 <- lmertree(depression ~ treatment | cluster | age + anxiety + duration,
                 data = DepressionDemo[train, ])
                 
## Obtain honest estimates of parameters and standard errors using test data
tree2 <- cv.lmertree(tree1, newdata = DepressionDemo[test, ])
tree3 <- cv.lmertree(tree1, newdata = DepressionDemo[test, ], 
                     reference = 7, omit.intercept = TRUE)

summary(tree2)
summary(tree3)

coef(tree1)
coef(tree2)
coef(tree3)

plot(tree1, which = "tree")
plot(tree2, which = "tree")
plot(tree3, which = "tree")

predict(tree1, newdata = DepressionDemo[1:5, ])
predict(tree2, newdata = DepressionDemo[1:5, ])

glmertree documentation built on April 3, 2025, 8:54 p.m.