conjugate_encoder: conjugate_encoder

Description Usage Arguments Value Examples

View source: R/conjugate_encoder.R

Description

This function encodes categorical variables by fitting a posterior distribution per each category to the target variable y, using a known conjugate-prior. The resulting mean(s) of each posterior distribution per each category are used as the encodings.

Usage

1
2
3
4
5
6
7
8
conjugate_encoder(
  X_train,
  X_test = NULL,
  y,
  cat_columns,
  prior_params,
  objective = "regression"
)

Arguments

X_train

A 'tibble' or 'data.frame' representing the training data set containing some categorical features/columns.

X_test

A 'tibble' or 'data.frame' representing the test set, containing some set of categorical features/columns.

y

A numeric vector or character vector representing the target variable. If the objective is "binary", then the vector should only contain two unique values.

cat_columns

A character vector containing the names of the categorical columns in the tibble that should be encoded.

prior_params

A list with named parameters that specify the prior assumed. For regression, this requires a list with named values: mu, vega, alpha, beta. All must be real numbers, alpha should be greater than 0, beta and vega should be greater than 0. mu can be negative. For binary classification, this requires a list with four named values: alpha, beta. All must be real numbers and be greater than 0.

objective

A string, either "regression" or "binary" specifying the problem. Default is regression.

Value

A list containing with processed training and test sets, in which the named categorical columns are replaced with their encodings. For regression, the encoder will add one additional dimension to the original training set since the assumed prior distribution is two dimensional.

Examples

1
2
3
4
5
6
conjugate_encoder(
X_train = mtcars,
y = mtcars$mpg,
cat_columns = c("cyl", "vs"),
prior_params = list(mu = 3, vega = 5, alpha = 3, beta = 3),
objective = "regression")

UBC-MDS/encodeR documentation built on March 31, 2020, 12:53 a.m.