one_hot: One Hot Encoding of data.table columns

Description Usage Arguments Details Examples

View source: R/functions_utility.R

Description

One-Hot-Encode unordered factor columns of a data.table mltools. From ben519's "mltools" package.

Usage

1
2
3
4
5
6
7
8
one_hot(
  dt,
  cols = "auto",
  sparsifyNAs = FALSE,
  naCols = FALSE,
  dropCols = TRUE,
  dropUnusedLevels = FALSE
)

Arguments

dt

A data.table

cols

Which column(s) should be one-hot-encoded? DEFAULT = "auto" encodes all unordered factor columns.

sparsifyNAs

Should NAs be converted to 0s?

naCols

Should columns be generated to indicate the present of NAs? Will only apply to factor columns with at least one NA

dropCols

Should the resulting data.table exclude the original columns which are one-hot-encoded?

dropUnusedLevels

Should columns of all 0s be generated for unused factor levels?

Details

One-hot-encoding converts an unordered categorical vector (i.e. a factor) to multiple binarized vectors where each binary vector of 1s and 0s indicates the presence of a class (i.e. level) of the of the original vector.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
library(data.table)

dt <- data.table(
  ID = 1:4,
  color = factor(c("red", NA, "blue", "blue"), levels=c("blue", "green", "red"))
)

one_hot(dt)
one_hot(dt, sparsifyNAs=TRUE)
one_hot(dt, naCols=TRUE)
one_hot(dt, dropCols=FALSE)
one_hot(dt, dropUnusedLevels=TRUE)

Nth-iteration-labs/contextual documentation built on July 28, 2020, 1:13 p.m.