h2o.targetencoder: Transformation of a categorical variable with a mean value of...
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.targetencoder

R Documentation

Transformation of a categorical variable with a mean value of the target variable

Description

Transformation of a categorical variable with a mean value of the target variable

Usage

h2o.targetencoder(
  x,
  y,
  training_frame,
  model_id = NULL,
  fold_column = NULL,
  columns_to_encode = NULL,
  keep_original_categorical_columns = TRUE,
  blending = FALSE,
  inflection_point = 10,
  smoothing = 20,
  data_leakage_handling = c("leave_one_out", "k_fold", "none", "LeaveOneOut", "KFold",
    "None"),
  noise = 0.01,
  seed = -1,
  ...
)

Arguments

`x`	(Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used.
`y`	The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model.
`training_frame`	Id of the training data frame.
`model_id`	Destination id for this model; auto-generated if not specified.
`fold_column`	Column with cross-validation fold index assignment per observation.
`columns_to_encode`	List of categorical columns or groups of categorical columns to encode. When groups of columns are specified, each group is encoded as a single column (interactions are created internally).
`keep_original_categorical_columns`	`Logical`. If true, the original non-encoded categorical features will remain in the result frame. Defaults to TRUE.
`blending`	`Logical`. If true, enables blending of posterior probabilities (computed for a given categorical value) with prior probabilities (computed on the entire set). This allows to mitigate the effect of categorical values with small cardinality. The blending effect can be tuned using the 'inflection_point' and 'smoothing' parameters. Defaults to FALSE.
`inflection_point`	Inflection point of the sigmoid used to blend probabilities (see 'blending' parameter). For a given categorical value, if it appears less that 'inflection_point' in a data sample, then the influence of the posterior probability will be smaller than the prior. Defaults to 10.
`smoothing`	Smoothing factor corresponds to the inverse of the slope at the inflection point on the sigmoid used to blend probabilities (see 'blending' parameter). If smoothing tends towards 0, then the sigmoid used for blending turns into a Heaviside step function. Defaults to 20.
`data_leakage_handling`	Data leakage handling strategy used to generate the encoding. Supported options are: 1) "none" (default) - no holdout, using the entire training frame. 2) "leave_one_out" - current row's response value is subtracted from the per-level frequencies pre-calculated on the entire training frame. 3) "k_fold" - encodings for a fold are generated based on out-of-fold data. Must be one of: "leave_one_out", "k_fold", "none", "LeaveOneOut", "KFold", "None". Defaults to None.
`noise`	The amount of noise to add to the encoded column. Use 0 to disable noise, and -1 (=AUTO) to let the algorithm determine a reasonable amount of noise. Defaults to 0.01.
`seed`	Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number).
`...`	Mainly used for backwards compatibility, to allow deprecated parameters.

Examples

## Not run: 
library(h2o)
h2o.init()
#Import the titanic dataset
f <- "https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv"
titanic <- h2o.importFile(f)

# Set response as a factor
response <- "survived"
titanic[response] <- as.factor(titanic[response])

# Split the dataset into train and test
splits <- h2o.splitFrame(data = titanic, ratios = .8, seed = 1234)
train <- splits[[1]]
test <- splits[[2]]

# Choose which columns to encode
encode_columns <- c("home.dest", "cabin", "embarked")

# Train a TE model
te_model <- h2o.targetencoder(x = encode_columns,
                              y = response, 
                              training_frame = train,
                              fold_column = "pclass", 
                              data_leakage_handling = "KFold")

# New target encoded train and test sets
train_te <- h2o.transform(te_model, train)
test_te <- h2o.transform(te_model, test)

## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.

h2o index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

h2o
R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.targetencoder: Transformation of a categorical variable with a mean value of...
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

Transformation of a categorical variable with a mean value of the target variable

Description

Usage

Arguments

Examples

Related to h2o.targetencoder in h2o...

R Package Documentation

Browse R Packages

We want your feedback!

h2o R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.targetencoder: Transformation of a categorical variable with a mean value of... In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

Transformation of a categorical variable with a mean value of the target variable

Description

Usage

Arguments

Examples

Related to h2o.targetencoder in h2o...

R Package Documentation

Browse R Packages

We want your feedback!

h2o
R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.targetencoder: Transformation of a categorical variable with a mean value of...
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform