knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>", eval = requireNamespace("modeldata", quietly = TRUE) ) options(digits = 3)
When recipe steps are used, there are different approaches that can be used to select which variables or features should be used.
The three main characteristics of variables that can be queried:
The manual pages for ?selections
and ?has_role
have details about the available selection methods.
To illustrate this, the credit data will be used:
library(recipes) library(modeldata) data("credit_data") str(credit_data) rec <- recipe(Status ~ Seniority + Time + Age + Records, data = credit_data) rec
Before any steps are used the information on the original variables is:
summary(rec, original = TRUE)
We can add a step to compute dummy variables on the non-numeric data after we impute any missing data:
dummied <- rec %>% step_dummy(all_nominal())
This will capture any variables that are either character strings or factors: Status
and Records
. However, since Status
is our outcome, we might want to keep it as a factor so we can subtract that variable out either by name or by role:
dummied <- rec %>% step_dummy(Records) # or dummied <- rec %>% step_dummy(all_nominal(), - Status) # or dummied <- rec %>% step_dummy(all_nominal_predictors())
Using the last definition:
dummied <- prep(dummied, training = credit_data) with_dummy <- bake(dummied, new_data = credit_data) with_dummy
Status
is unaffected.
One important aspect about selecting variables in steps is that the variable names and types may change as steps are being executed. In the above example, Records
is a factor variable before the step is executed. Afterwards, Records
is gone and the binary variable Records_yes
is in its place. One reason to have general selection routines like all_predictors()
or contains()
is to be able to select variables that have not be created yet.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.