prep | R Documentation |
For a recipe with at least one preprocessing operation, estimate the required parameters from a training set that can be later applied to other data sets.
prep(x, ...)
## S3 method for class 'recipe'
prep(
x,
training = NULL,
fresh = FALSE,
verbose = FALSE,
retain = TRUE,
log_changes = FALSE,
strings_as_factors = TRUE,
...
)
x |
an |
... |
further arguments passed to or from other methods (not currently used). |
training |
A data frame, tibble, or sparse matrix from the |
fresh |
A logical indicating whether already trained operation should be
re-trained. If |
verbose |
A logical that controls whether progress is reported as operations are executed. |
retain |
A logical: should the preprocessed training set be saved into
the |
log_changes |
A logical for printing a summary for each step regarding which (if any) columns were added or removed during training. |
strings_as_factors |
A logical: should character columns that have role
|
Given a data set, this function estimates the required quantities and
statistics needed by any operations. prep()
returns an updated recipe with
the estimates. If you are using a recipe as a preprocessor for modeling, we
highly recommend that you use a workflow()
instead of manually
estimating a recipe (see the example in recipe()
).
Note that missing data is handled in the steps; there is no global na.rm
option at the recipe level or in prep()
.
Also, if a recipe has been trained using prep()
and then steps are added,
prep()
will only update the new operations. If fresh = TRUE
, all of the
operations will be (re)estimated.
As the steps are executed, the training
set is updated. For example, if the
first step is to center the data and the second is to scale the data, the
step for scaling is given the centered data.
A recipe whose step objects have been updated with the required quantities
(e.g. parameter estimates, model objects, etc). Also, the term_info
object
is likely to be modified as the operations are executed.
recipe()
and bake()
data(ames, package = "modeldata")
library(dplyr)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))
ames_rec <-
recipe(
Sale_Price ~ Longitude + Latitude + Neighborhood + Year_Built + Central_Air,
data = ames
) %>%
step_other(Neighborhood, threshold = 0.05) %>%
step_dummy(all_nominal()) %>%
step_interact(~ starts_with("Central_Air"):Year_Built) %>%
step_ns(Longitude, Latitude, deg_free = 5)
prep(ames_rec, verbose = TRUE)
prep(ames_rec, log_changes = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.