make_onehot: Categorical Data Matrix to One-Hot Binary Matrix

Description Usage Arguments Details Value Examples

Description

Categorical Data Matrix to One-Hot Binary Matrix

Usage

1
2
make_onehot(data, minus_level = FALSE, clarify_levels = TRUE,
  scale = FALSE)

Arguments

data

a categorical data matrix.

minus_level

logical (default FALSE); if TRUE then create binary encodings for m-1 levels of a variable with m original levels

clarify_levels

logical (default TRUE); if TRUE then disambiguate resulting column names

scale

logical, if FALSE then binary matrix is returned. If TRUE, then normalization (see details) is applied to each binary transformed variable.

Details

The normalization technique is taken from Outlier Analysis (Aggarwal, 2017), section 8.3. For each column j in the binary transformed matrix, a normalization factor is defined as sqrt(ni \* pj \* (1-pj)), where ni is the number of distinct categories in the reference variable from the raw data set and pj is the proportion of records taking the value of 1 for the jth variable

Value

A transformed one hot encoded matrix is returned.

Examples

1
2
3
df <- data.frame(gender = sample(c("male", "female"), 25, T),
                 age = sample(c("young", "old", "unknown"), 25, T))
make_onehot(data = df)

dannymorris/smltools documentation built on May 15, 2019, 10:49 a.m.