onehot: One-hot encoding

Description Usage Arguments Details Value See Also Examples

View source: R/onehot.R

Description

Add indicators for all desired variables in a data set.

Usage

1
2
onehot(data, var = NULL, nas = "na.pass", sparse = FALSE,
  keep.original = FALSE)

Arguments

data

A data frame

var

A character string/vector of names to be encoded. If NULL, the default, all character and factor variables will be encoded.

nas

What to do with missing values. For na.omit and na.exclude, any observations with missing data will be removed from the result. With na.pass, the default, the result will retain the missing values. Otherwise, with na.fail, an error will be thrown.

sparse

Logical (default FALSE). If true, will return only the encoded variables as a sparse matrix.

keep.original

Logical (default FALSE). Keep the original variables? Not an option if sparse is TRUE.

Details

This function is a simple one-hot encoder, with a couple options that are commonly desired. Takes the applicable variables and creates a binary indicator column for each unique value. If supplied non-factor/character variables, it will coerce them to characters and proceed accordingly. Will handle missingness, return a sparse matrix, or keep the original variable(s) as desired.

Value

A data.frame with the encoded variables, or a sparse matrix of only the encoded variables.

See Also

model.matrix

Examples

1
2
3
4
5
6
7
8
9
library(lazerhawk)
str(onehot(iris, keep.original = TRUE))
str(onehot(iris, sparse = TRUE))
str(onehot(mtcars, var = c('vs','cyl')))

iris2 = iris
iris2[sample(1:150, 25),] = NA
str(onehot(iris2))
str(onehot(iris2, nas = 'na.omit'))

mclark--/lazerhawk documentation built on July 17, 2018, 3:11 a.m.