one_hot: Categorical Data Matrix to One-Hot Binary Matrix

Description Usage Arguments Details Value Examples

Description

Categorical Data Matrix to One-Hot Binary Matrix

Usage

1
one_hot(data, normalize = FALSE)

Arguments

data

a matrix or data.frame of categorical variables as character strings or factors.

normalize

logical, if FALSE then binary matrix is returned. If TRUE, then normalization (see details) is applied to each binary transformed variable.

Details

The normalization technique is taken from Outlier Analysis (Aggarwal, 2017), section 8.3. For each column j in the binary transformed matrix, a normalization factor is defined as sqrt(ni \* pj \* (1-pj)), where ni is the number of distinct categories in the reference variable from the raw data set and pj is the proportion of records taking the value of 1 for the jth variable

Value

A transformed matrix is returned.

Examples

1
2
3
x <- data.frame(gender = sample(c("male", "female"), 15, T),
                age_cat = sample(c("young", "old", "unknown"), 15, T)) 
one_hot(data = x, normalize = TRUE)

dannymorris/outsiders documentation built on May 13, 2019, 1:22 p.m.