step_missing: Clean NA values from categorical/nominal variables

Description Usage Arguments Details Value Examples

View source: R/step_hcai_missing.R

Description

step_missing creates a specification of a recipe that will replace NA values with a new factor level, missing.

Usage

1
2
step_missing(recipe, ..., role = NA, trained = FALSE,
  na_percentage = NULL, skip = FALSE)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose which variables are affected by the step. See ?recipes::selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the number of NA values have been counted in preprocessing.

na_percentage

A named numeric vector of NA percentages. This is NULL until computed by prep.recipe().

skip

A logical. Should the step be skipped when the recipe is baked?

Details

NA values are counted when the recipe is trained using prep.recipe. bake.recipe then fills in the missing values for the new data.

Value

An updated version of recipe with the new step added to the sequence of existing steps (if any). For the tidy method, a tibble with columns terms (the selectors or variables selected) and value (the NA counts).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
library(recipes)
n = 100
d <- tibble::tibble(encounter_id = 1:n,
                    patient_id = sample(1:20, size = n, replace = TRUE),
                    hemoglobin_count = rnorm(n, mean = 15, sd = 1),
                    hemoglobin_category = sample(c("Low", "Normal", "High", NA),
                                                 size = n, replace = TRUE),
                    disease = ifelse(hemoglobin_count < 15, "Yes", "No")
)

# Initialize
my_recipe <- recipe(disease ~ ., data = d)

# Create recipe
my_recipe <- my_recipe %>%
  step_missing(all_nominal())
my_recipe

# Train recipe
trained_recipe <- prep(my_recipe, training = d)

# Apply recipe
data_modified <- bake(trained_recipe, newdata = d)

healthcareai documentation built on Sept. 2, 2018, 1:03 a.m.