| misl | R Documentation |
Imputes missing values using multiple imputation by super learning.
misl(
dataset,
m = 5,
maxit = 5,
seed = NA,
con_method = c("glm", "rand_forest", "boost_tree"),
bin_method = c("glm", "rand_forest", "boost_tree"),
cat_method = c("rand_forest", "boost_tree"),
ord_method = c("polr", "rand_forest", "boost_tree"),
cv_folds = 5,
ignore_predictors = NA,
quiet = TRUE
)
dataset |
A dataframe or matrix containing the incomplete data.
Missing values are represented with |
m |
The number of multiply imputed datasets to create. Default |
maxit |
The number of iterations per imputed dataset. Default |
seed |
Integer seed for reproducibility, or |
con_method |
Character vector of learner IDs, a list of parsnip model
specs, or a mixed list of both, for continuous columns.
Default |
bin_method |
Character vector of learner IDs, a list of parsnip model
specs, or a mixed list of both, for binary columns
(values must be |
cat_method |
Character vector of learner IDs, a list of parsnip model
specs, or a mixed list of both, for unordered categorical columns.
Default |
ord_method |
Character vector of learner IDs, a list of parsnip model
specs, or a mixed list of both, for ordered categorical columns.
Default |
cv_folds |
Integer number of cross-validation folds used when stacking
multiple learners. Reducing this (e.g. to |
ignore_predictors |
Character vector of column names to exclude as
predictors. Default |
quiet |
Suppress console progress messages. Default |
Built-in named learners (see list_learners()):
"glm" - base R (logistic for binary, linear for continuous)
"rand_forest" - ranger
"boost_tree" - xgboost
"mars" - earth
"multinom_reg" - nnet (unordered categorical only)
"polr" - MASS (ordered categorical only)
Any parsnip-compatible model spec can also be passed directly via the
*_method arguments. Named strings and parsnip specs can be mixed
in the same list:
library(parsnip)
misl(data,
con_method = list(
"glm",
rand_forest(trees = 500) |> set_engine("ranger")
)
)
The mode (regression vs classification) is always enforced by misl
regardless of what is set on the spec.
A list of m named lists, each with:
datasetsA fully imputed tibble.
traceA long-format tibble of mean/sd trace statistics per iteration, for convergence inspection.
Imputation across the m datasets is parallelised via
future.apply. To enable parallel execution, set a future plan
before calling misl():
library(future) plan(multisession, workers = 4) result <- misl(data, m = 5) plan(sequential)
# Using named learners (same as v1.0)
set.seed(1)
n <- 100
demo_data <- data.frame(x1 = rnorm(n), x2 = rnorm(n), y = rnorm(n))
demo_data[sample(n, 10), "y"] <- NA
misl_imp <- misl(demo_data, m = 2, maxit = 2, con_method = "glm")
# Using a custom parsnip spec
## Not run:
library(parsnip)
misl_imp <- misl(
demo_data, m = 2, maxit = 2,
con_method = list(
"glm",
rand_forest(trees = 500) |> set_engine("ranger")
)
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.