View source: R/autotune_VIM_Irmi.R
autotune_VIM_Irmi | R Documentation |
Function use IRMI (Iterative robust model-based imputation ) to impute missing data.
autotune_VIM_Irmi( df, col_type = NULL, percent_of_missing = NULL, eps = 5, maxit = 100, step = FALSE, robust = FALSE, init.method = "kNN", force = FALSE, col_0_1 = FALSE, out_file = NULL )
df |
data.frame. Df to impute with column names and without target column. |
col_type |
character vector. Vector containing column type names. |
percent_of_missing |
numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..) |
eps |
threshold for convergency |
maxit |
maximum number of iterations |
step |
stepwise model selection is applied when the parameter is set to TRUE |
robust |
if TRUE, robust regression methods will be applied (it's impossible to set step=TRUE and robust=TRUE at the same time) |
init.method |
Method for initialization of missing values (kNN or median) |
force |
if TRUE, the algorithm tries to find a solution in any case, possible by using different robust methods automatically. (should be set FALSE for simulation) |
col_0_1 |
Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset). |
out_file |
Output log file location if file already exists log message will be added. If NULL no log will be produced. |
Function can work with various different times depending on data size and structure. In some cases when selected param wouldn't work function try to run on default. Most important param for both quality and reliability its eps.
Return one data.frame with imputed values.
Alexander Kowarik, Matthias Templ (2016) doi: 10.18637/jss.v074.i07
Alexander Kowarik, Matthias Templ (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07
{ raw_data <- data.frame( a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)), b = as.integer(1:1000), c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)), d = runif(1000, 1, 10), e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)), f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE))) # Prepering col_type col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor") percent_of_missing <- 1:6 for (i in percent_of_missing) { percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data)) } imp_data <- autotune_VIM_Irmi(raw_data, col_type, percent_of_missing) # Check if all missing value was imputed sum(is.na(imp_data)) == 0 # TRUE }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.