
Note: This package is currently experimental and under active development. The API may change. Feedback and bug reports are welcome via GitHub Issues.
misl implements Multiple Imputation by Super Learning (MISL), a flexible approach to handling missing data that uses a stacked ensemble of machine learning algorithms to impute missing values across continuous, binary, and categorical variables.
Rather than relying on a single parametric imputation model, MISL builds a super learner for each incomplete variable using the tidymodels framework, combining learners such as linear/logistic regression, random forests, gradient boosted trees, and MARS to produce well-calibrated imputations.
The method is described in:
Carpenito T, Manjourides J. (2022) MISL: Multiple imputation by super learning. Statistical Methods in Medical Research. 31(10):1904–1915. doi: 10.1177/09622802221104238
misl is not yet on CRAN. Install the development version from GitHub:
# install.packages("remotes")
remotes::install_github("JustinManjourides/misl")
The following backend packages are optional but recommended:
install.packages(c("ranger", "xgboost", "earth"))
library(misl)
# Introduce missingness into a dataset
set.seed(42)
n <- 200
demo_data <- data.frame(
age = rnorm(n, mean = 50, sd = 10),
weight = rnorm(n, mean = 70, sd = 15),
smoker = rbinom(n, 1, 0.3),
group = factor(sample(c("A", "B", "C"), n, replace = TRUE))
)
demo_data[sample(n, 20), "age"] <- NA
demo_data[sample(n, 15), "weight"] <- NA
demo_data[sample(n, 10), "smoker"] <- NA
demo_data[sample(n, 10), "group"] <- NA
# Run MISL with default settings
misl_imp <- misl(
demo_data,
m = 5,
maxit = 5,
con_method = c("glm", "rand_forest"),
bin_method = c("glm", "rand_forest"),
cat_method = c("rand_forest", "multinom_reg")
)
# Each of the m imputed datasets is accessible via:
completed_data <- misl_imp[[1]]$datasets
# Trace plots can be used to inspect convergence:
trace <- misl_imp[[1]]$trace
Imputation across the m datasets is parallelised via the
future framework. To enable parallel
execution, set a plan before calling misl():
library(future)
plan(multisession, workers = 4)
misl_imp <- misl(demo_data, m = 5, maxit = 5)
plan(sequential) # reset when done
# View all available learners
list_learners()
# Filter by outcome type
list_learners("continuous")
list_learners("categorical")
# Show only installed learners
list_learners(installed_only = TRUE)
If you use misl in your research, please cite the original paper:
Carpenito T, Manjourides J. (2022) MISL: Multiple imputation by super
learning. Statistical Methods in Medical Research. 31(10):1904-1915.
doi: 10.1177/09622802221104238
BibTeX:
@article{carpenito2022misl,
author = {Carpenito, T and Manjourides, J},
title = {{MISL}: Multiple imputation by super learning},
journal = {Statistical Methods in Medical Research},
year = {2022},
volume = {31},
number = {10},
pages = {1904--1915},
doi = {10.1177/09622802221104238}
}
MIT © see LICENSE
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.