step_unknown | R Documentation |
step_unknown()
creates a specification of a recipe step that will assign
a missing value in a factor level to "unknown"
.
step_unknown(
recipe,
...,
role = NA,
trained = FALSE,
new_level = "unknown",
objects = NULL,
skip = FALSE,
id = rand_id("unknown")
)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose variables
for this step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
new_level |
A single character value that will be assigned to new factor levels. |
objects |
A list of objects that contain the information
on factor levels that will be determined by |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
The selected variables are adjusted to have a new
level (given by new_level
) that is placed in the last
position.
Note that if the original columns are character, they will be converted to factors by this step.
If new_level
is already in the data given to prep
, an error
is thrown.
An updated version of recipe
with the new step added to the
sequence of any existing operations.
When you tidy()
this step, a tibble is returned with
columns terms
, value
, and id
:
character, the selectors or variables selected
character, the factor levels for the new values
character, id of this step
The underlying operation does not allow for case weights.
dummy_names()
Other dummy variable and encoding steps:
step_bin2factor()
,
step_count()
,
step_date()
,
step_dummy()
,
step_dummy_extract()
,
step_dummy_multi_choice()
,
step_factor2string()
,
step_holiday()
,
step_indicate_na()
,
step_integer()
,
step_novel()
,
step_num2factor()
,
step_ordinalscore()
,
step_other()
,
step_regex()
,
step_relevel()
,
step_string2factor()
,
step_time()
,
step_unorder()
data(Sacramento, package = "modeldata")
rec <-
recipe(~ city + zip, data = Sacramento) %>%
step_unknown(city, new_level = "unknown city") %>%
step_unknown(zip, new_level = "unknown zip") %>%
prep()
table(bake(rec, new_data = NULL) %>% pull(city),
Sacramento %>% pull(city),
useNA = "always"
) %>%
as.data.frame() %>%
dplyr::filter(Freq > 0)
tidy(rec, number = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.