step_best_normalize: Run bestNormalize transformation for 'recipes' implementation
In bestNormalize: Normalizing Transformation Functions

step_best_normalize

R Documentation

Run bestNormalize transformation for `recipes` implementation

Description

'step_best_normalize' creates a specification of a recipe step (see 'recipes' package) that will transform data using the best of a suite of normalization transformations estimated (by default) using cross-validation.

Usage

step_best_normalize(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  transform_info = NULL,
  transform_options = list(),
  num_unique = 5,
  skip = FALSE,
  id = rand_id("best_normalize")
)

## S3 method for class 'step_best_normalize'
tidy(x, ...)

## S3 method for class 'step_best_normalize'
axe_env(x, ...)

Arguments

`recipe`	A formula or recipe
`...`	One or more selector functions to choose which variables are affected by the step. See [selections()] for more details. For the 'tidy' method, these are not currently used.
`role`	Not used by this step since no new variables are created.
`trained`	For recipes functionality
`transform_info`	A numeric vector of transformation values. This (was transform_info) is 'NULL' until computed by [prep.recipe()].
`transform_options`	options to be passed to bestNormalize
`num_unique`	An integer where data that have less possible values will not be evaluate for a transformation.
`skip`	For recipes functionality
`id`	For recipes functionality
`x`	A 'step_best_normalize' object.

Details

The bestnormalize transformation can be used to rescale a variable to be more similar to a normal distribution. See '?bestNormalize' for more information; 'step_best_normalize' is the implementation of 'bestNormalize' in the 'recipes' context.

As of version 1.7, the 'butcher' package can be used to (hopefully) improve scalability of this function on bigger data sets.

Value

An updated version of 'recipe' with the new step added to the sequence of existing steps (if any). For the 'tidy' method, a tibble with columns 'terms' (the selectors or variables selected) and 'value' (the lambda estimate).

Examples


library(recipes)
rec <- recipe(~ ., data = as.data.frame(iris))

bn_trans <- step_best_normalize(rec, all_numeric())

bn_estimates <- prep(bn_trans, training = as.data.frame(iris))

bn_data <- bake(bn_estimates, as.data.frame(iris))

plot(density(iris[, "Petal.Length"]), main = "before")
plot(density(bn_data$Petal.Length), main = "after")

tidy(bn_trans, number = 1)
tidy(bn_estimates, number = 1)

bestNormalize documentation built on Aug. 18, 2023, 9:08 a.m.