Home

/

CRAN

/

booami

/

booami-package: Boosting with Multiple Imputation (booami)

booami-package: Boosting with Multiple Imputation (booami)
In booami: Component-Wise Gradient Boosting after Multiple Imputation

booami-package

R Documentation

Boosting with Multiple Imputation (booami)

Description

booami provides component-wise gradient boosting tailored for analysis with multiply imputed datasets. Its core contribution is MIBoost, an algorithm that couples base-learner selection across imputed datasets by minimizing an aggregated loss at each iteration, yielding a single, unified regularization path and improved model stability. For comparison, booami also includes per-dataset boosting with post-hoc pooling (estimate averaging or selection-frequency thresholding).

Details

What is MIBoost?

In each boosting iteration, candidate base-learners are fit separately within each imputed dataset, but selection is made jointly via the aggregated loss across datasets. The selected base-learner is then updated in every imputed dataset, and fitted contributions are averaged to form a single combined predictor. This enforces uniform variable selection while preserving dataset-specific gradients and updates.

Cross-validation without leakage

booami implements a leakage-avoiding CV protocol: data are first split into training and validation subsets; the training covariates are multiply imputed; validation covariates are imputed using the training imputation models; and (if enabled) centering uses a fold-specific grand mean \mu_\star computed from the training imputations and applied consistently to all imputed training and validation matrices. Errors are averaged across imputations and folds to select the optimal number of boosting iterations (mstop). Use cv_boost_raw for raw data with missing covariates (imputation inside CV), or cv_boost_imputed when imputed datasets are already prepared.

Note: In the recommended predictive workflow implemented by cv_boost_raw(), rows with missing outcomes y are removed before fold assignment, and the outcome is not used for imputation (covariates X are imputed without including y as a predictor).

Key features

MIBoost (uniform selection): Joint base-learner selection via aggregated loss across imputed datasets; averaged fitted functions yield a single model.
Per-dataset boosting (with pooling): Independent boosting in each imputed dataset, with pooling by estimate averaging or by selection-frequency thresholding.
Flexible losses and learners: Supports Gaussian and logistic losses with component-wise base-learners; extensible to other learners.
Leakage-safe CV: Training/validation split → train-only imputation of covariates → fold-wise grand-mean centering (\mu_\star) → error aggregation across imputations and folds.

Main functions

impu_boost — Core routine implementing MIBoost as well as per-dataset boosting with pooling.
cv_boost_raw — Leakage-safe k-fold CV starting from a single dataset with missing covariates (imputation performed inside each fold).
cv_boost_imputed — CV when imputed datasets (and splits) are already available.

Typical workflow

Raw data with missing covariates: use cv_boost_raw() to impute within folds, select mstop, and fit the final model.
Already imputed datasets: use cv_boost_imputed() to select mstop and fit.
Direct control: call impu_boost() when you want to run MIBoost (or per-dataset boosting) directly, optionally followed by pooling.

Mathematical sketch

At boosting iteration t, for each candidate base-learner r and each imputed dataset m = 1,\dots,M, let RSS_r^{(m)[t]} denote the residual sum of squares. The aggregated loss is

L_r^{[t]} = \sum_{m=1}^M RSS_r^{(m)[t]}.

The base-learner r^* with minimal aggregated loss is selected jointly, updated in all imputed datasets, and the fitted contributions are averaged to form the combined predictor. After t_{\mathrm{stop}} iterations, this yields a single final model.

References

Buehlmann, P. and Hothorn, T. (2007). "Boosting Algorithms: Regularization, Prediction and Model Fitting." \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/07-STS242")}
Friedman, J. H. (2001). "Greedy Function Approximation: A Gradient Boosting Machine." \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/aos/1013203451")}
van Buuren, S. and Groothuis-Oudshoorn, K. (2011). "mice: Multivariate Imputation by Chained Equations in R." \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v045.i03")}

Citation

For details, see: Kuchen, R. (2025). "MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation." \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2507.21807")} https://arxiv.org/abs/2507.21807.

Author(s)

Maintainer: Robert Kuchen rokuchen@uni-mainz.de

booami
Component-Wise Gradient Boosting after Multiple Imputation

booami-package: Boosting with Multiple Imputation (booami)
In booami: Component-Wise Gradient Boosting after Multiple Imputation

Boosting with Multiple Imputation (booami)

Description

Details

What is MIBoost?

Cross-validation without leakage

Key features

Main functions

Typical workflow

Mathematical sketch

References

Citation

See also

Author(s)

See Also

Related to booami-package in booami...

R Package Documentation

Browse R Packages

We want your feedback!

booami Component-Wise Gradient Boosting after Multiple Imputation

booami-package: Boosting with Multiple Imputation (booami) In booami: Component-Wise Gradient Boosting after Multiple Imputation

Boosting with Multiple Imputation (booami)

Description

Details

What is MIBoost?

Cross-validation without leakage

Key features

Main functions

Typical workflow

Mathematical sketch

References

Citation

See also

Author(s)

See Also

Related to booami-package in booami...

R Package Documentation

Browse R Packages

We want your feedback!

booami
Component-Wise Gradient Boosting after Multiple Imputation

booami-package: Boosting with Multiple Imputation (booami)
In booami: Component-Wise Gradient Boosting after Multiple Imputation