| impu_boost | R Documentation |
Applies component-wise gradient boosting to multiply imputed datasets. Depending on the settings, either a separate model is reported for each imputed dataset, or the M models are pooled to yield a single final model. For pooling, one can choose the novel MIBoost algorithm, which enforces a uniform variable-selection scheme across all imputed datasets, or the more conventional ad-hoc approaches of estimate-averaging and selection-frequency thresholding.
impu_boost(
X_list,
y_list,
X_list_val = NULL,
y_list_val = NULL,
ny = 0.1,
mstop = 250,
type = c("gaussian", "logistic"),
MIBoost = TRUE,
pool = TRUE,
pool_threshold = 0,
center = c("auto", "force", "off")
)
X_list |
List of length M; each element is an |
y_list |
List of length M; each element is a length- |
X_list_val |
Optional validation list (same structure as |
y_list_val |
Optional validation list (same structure as |
ny |
Learning rate. Defaults to |
mstop |
Number of boosting iterations (default |
type |
Type of loss function. One of:
|
MIBoost |
Logical. If |
pool |
Logical. If |
pool_threshold |
Only used when |
center |
One of |
This function supports MIBoost, which enforces uniform variable selection across multiply imputed datasets. For full methodology, see Kuchen (2025).
A list with elements:
INT: intercept(s). A scalar if pool = TRUE, otherwise
a length-M vector.
BETA: coefficient estimates. A length-p vector if
pool = TRUE, otherwise an M \times p matrix.
CV_error: vector of validation errors (if validation data
were provided), otherwise NULL.
Kuchen, R. (2025). MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation. arXiv:2507.21807. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2507.21807")} https://arxiv.org/abs/2507.21807.
simulate_booami_data, cv_boost_raw, cv_boost_imputed
set.seed(123)
utils::data(booami_sim)
M <- 2
n <- nrow(booami_sim)
x_cols <- grepl("^X\\d+$", names(booami_sim))
tr_idx <- sample(seq_len(n), floor(0.8 * n))
dat_tr <- booami_sim[tr_idx, , drop = FALSE]
dat_va <- booami_sim[-tr_idx, , drop = FALSE]
pm_tr <- mice::quickpred(dat_tr, method = "spearman",
mincor = 0.30, minpuc = 0.60)
imp_tr <- mice::mice(dat_tr, m = M, predictorMatrix = pm_tr,
maxit = 1, printFlag = FALSE)
imp_va <- mice::mice.mids(imp_tr, newdata = dat_va, maxit = 1, printFlag = FALSE)
X_list <- vector("list", M)
y_list <- vector("list", M)
X_list_val <- vector("list", M)
y_list_val <- vector("list", M)
for (m in seq_len(M)) {
tr_m <- mice::complete(imp_tr, m)
va_m <- mice::complete(imp_va, m)
X_list[[m]] <- data.matrix(tr_m[, x_cols, drop = FALSE])
y_list[[m]] <- tr_m$y
X_list_val[[m]] <- data.matrix(va_m[, x_cols, drop = FALSE])
y_list_val[[m]] <- va_m$y
}
fit <- impu_boost(
X_list, y_list,
X_list_val = X_list_val, y_list_val = y_list_val,
ny = 0.1, mstop = 50, type = "gaussian",
MIBoost = TRUE, pool = TRUE, center = "auto"
)
which.min(fit$CV_error)
head(fit$BETA)
fit$INT
## Not run:
# Heavier demo (more imputed datasets and iterations; for local runs)
set.seed(2025)
utils::data(booami_sim)
M <- 10
n <- nrow(booami_sim)
x_cols <- grepl("^X\\d+$", names(booami_sim))
tr_idx <- sample(seq_len(n), floor(0.8 * n))
dat_tr <- booami_sim[tr_idx, , drop = FALSE]
dat_va <- booami_sim[-tr_idx, , drop = FALSE]
pm_tr <- mice::quickpred(dat_tr, method = "spearman",
mincor = 0.20, minpuc = 0.40)
imp_tr <- mice::mice(dat_tr, m = M, predictorMatrix = pm_tr,
maxit = 5, printFlag = TRUE)
imp_va <- mice::mice.mids(imp_tr, newdata = dat_va, maxit = 1, printFlag = FALSE)
X_list <- vector("list", M)
y_list <- vector("list", M)
X_list_val <- vector("list", M)
y_list_val <- vector("list", M)
for (m in seq_len(M)) {
tr_m <- mice::complete(imp_tr, m)
va_m <- mice::complete(imp_va, m)
X_list[[m]] <- data.matrix(tr_m[, x_cols, drop = FALSE])
y_list[[m]] <- tr_m$y
X_list_val[[m]] <- data.matrix(va_m[, x_cols, drop = FALSE])
y_list_val[[m]] <- va_m$y
}
fit_heavy <- impu_boost(
X_list, y_list,
X_list_val = X_list_val, y_list_val = y_list_val,
ny = 0.1, mstop = 250, type = "gaussian",
MIBoost = TRUE, pool = TRUE, center = "auto"
)
str(fit_heavy)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.