impute_xgboost: Fast imputation of missing values by extreme gradien boosting

Description Usage Arguments Value Examples

Description

Uses the "xgboost" package to do fast missing value imputation by extreme gradien boosting. Between the iterative model fitting, it offers the option of predictive mean matching. This firstly avoids imputation with values not present in the original mat (like a value 0.3334 in a 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level and, as such, allows to do multiple imputation when repeating the call to impute_xgboost(). The iterative chaining stops as soon as max_iterations is reached or if the average out-of-bag estimate of performance stops improving. In the latter case, except for the first iteration, the second last (i.e. best) imputed matrix is returned.

Usage

1
2
3
impute_xgboost(mat, max_iterations = 10L, seed = NULL, verbose = 1,
  pmm_k = 0, nrounds = 40, eta = 0.4, max_depth = 6,
  objective = "reg:linear", eval_metric = "rmse", ...)

Arguments

mat

A matrix with missing values to impute.

max_iterations

Maximum number of chaining iterations.

seed

Integer seed to initialize the random generator.

verbose

Controls how much info is printed to screen. 0 to print nothing. 1 (default) to print a "." per iteration and standardized prediction error , 2 to print model convergences.

pmm_k

Number of candidate non-missing values to sample from in the predictive mean matching step. 0 to avoid this step.

nrounds

max number of boosting iterations.

eta

eta control the learning rate: scale the contribution of each tree by a factor of 0 < eta < 1 when it is added to the current approximation. Used to prevent overfitting by making the boosting process more conservative. Lower value for eta implies larger value for nrounds: low eta value means model more robust to overfitting but slower to compute. Default: 0.3

max_depth

maximum depth of a tree. Default: 6

objective

specify the learning task and the corresponding learning objective, default 'reg:linear'

eval_metric

evaluation metrics for validation data. Default 'rmse'

...

Arguments passed to xgboost.

Value

An imputed matrix.

Examples

1
2
3
mat = as.matrix(iris[,1:4])
mis_mat = generate_na(mat , 0.3)
imp_mat = impute_xgboost(mis_mat)

yatzy/xgbimpute documentation built on June 7, 2019, 8:16 p.m.