impute: Impute data and return a reusable recipe

Description Usage Arguments Value Examples

View source: R/impute.R

Description

impute will impute your data using a variety of methods for both nominal and numeric data. Currently supports mean (numeric only), new_category (categorical only), bagged trees, or knn.

Usage

1
2
3
impute(d = NULL, ..., recipe = NULL, numeric_method = "mean",
  nominal_method = "new_category", numeric_params = NULL,
  nominal_params = NULL, verbose = FALSE)

Arguments

d

A dataframe or tibble containing data to impute.

...

Optional. Unquoted variable names to not be imputed. These will be returned unaltered.

recipe

Optional, a recipe object or an imputed data frame (containing a recipe object as an attribute). If provided, this recipe will be applied to impute new data contained in d with values saved in the recipe. Use this param if you'd like to apply the same values used for imputation on a training dataset in production.

numeric_method

Defaults to "mean". Other choices are "bagimpute" or "knnimpute".

nominal_method

Defaults to "new_category". Other choices are "bagimpute" or "knnimpute".

numeric_params

A named list with parmeters to use with chosen imputation method on numeric data. Options are bag_model (bagimpute only), bag_options (bagimpute only), knn_K, (knnimpute only), impute_with, (bag or knn) or seed_val (bag or knn). See step_bagimpute or step_knnimpute for details.

nominal_params

A named list with parmeters to use with chosen imputation method on nominal data. Options are bag_model (bagimpute only), bag_options (bagimpute only), knn_K, (knnimpute only), impute_with, (bag or knn) or seed_val (bag or knn). See step_bagimpute or step_knnimpute for details.

verbose

Gives a print out of what will be imputed and which method will be used.

Value

Imputed data frame with reusable recipe object for future imputation in attribute "recipe".

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
d <- pima_diabetes
d_train <- d[1:700, ]
d_test <- d[701:768, ]
# Train imputer
train_imputed <- impute(d = d_train, patient_id, diabetes)
# Apply to new data
impute(d = d_test, patient_id, diabetes, recipe = train_imputed)
# Specify methods:
impute(d = d_train, patient_id, diabetes, numeric_method = "bagimpute",
nominal_method = "new_category")
# Specify method and param:
impute(d = d_train, patient_id, diabetes, nominal_method = "knnimpute",
nominal_params = list(knn_K = 4))

healthcareai documentation built on Sept. 2, 2018, 1:03 a.m.