h2o.target_encode_apply: Apply Target Encoding Map to Frame
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.target_encode_apply

R Documentation

Apply Target Encoding Map to Frame

Description

Applies a target encoding map to an H2OFrame object. Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models. A Target Encoding tutorial is available here: https://github.com/h2oai/h2o-tutorials/blob/master/best-practices/categorical-predictors/target_encoding.md.

Usage

h2o.target_encode_apply(
  data,
  x,
  y,
  target_encode_map,
  holdout_type,
  fold_column = NULL,
  blended_avg = TRUE,
  noise_level = NULL,
  seed = -1
)

Arguments

`data`	An H2OFrame object with which to apply the target encoding map.
`x`	A list containing the names or indices of the variables to encode. A target encoding column will be created for each element in the list. Items in the list can be multiple columns. For example, if 'x = list(c("A"), c("B", "C"))', then the resulting frame will have a target encoding column for A and a target encoding column for B & C (in this case, we group by two columns).
`y`	The name or column index of the response variable in the data. The response variable can be either numeric or binary.
`target_encode_map`	A list of H2OFrame objects that is the results of the `h2o.target_encode_create` function.
`holdout_type`	The holdout type used. Must be one of: "LeaveOneOut", "KFold", "None".
`fold_column`	(Optional) The name or column index of the fold column in the data. Defaults to NULL (no 'fold_column'). Only required if 'holdout_type' = "KFold".
`blended_avg`	`Logical`. (Optional) Whether to perform blended average.
`noise_level`	(Optional) The amount of random noise added to the target encoding. This helps prevent overfitting. Defaults to 0.01 * range of y.
`seed`	(Optional) A random seed used to generate draws from the uniform distribution for random noise. Defaults to -1.

Value

Returns an H2OFrame object containing the target encoding per record.

Examples

## Not run: 
library(h2o)
h2o.init()

# Get Target Encoding Frame on bank-additional-full data with numeric `y`
data <- h2o.importFile(
  path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv")
splits <- h2o.splitFrame(data, seed = 1234)
train <- splits[[1]]
test <- splits[[2]]
mapping <- h2o.target_encode_create(data = train, x = list(c("job"), c("job", "marital")), 
                                    y = "age")

# Apply mapping to the training dataset
train_encode <- h2o.target_encode_apply(data = train, x = list(c("job"), c("job", "marital")), 
                                        y = "age", mapping, holdout_type = "LeaveOneOut")
# Apply mapping to a test dataset
test_encode <- h2o.target_encode_apply(data = test, x = list(c("job"), c("job", "marital")), 
                                       y = "age", target_encode_map = mapping,
                                       holdout_type = "None")


## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.