onehot: Onehot Encode a data.frame

Description Usage Arguments Details Value Examples

Description

Onehot Encode a data.frame

Usage

1
onehot(data, sentinel = -999, max_levels = 10, add_NA_factors = TRUE)

Arguments

data

data.frame to convert factors into onehot encoded columns

sentinel

Numeric value with which to replace NAs. Applies to numeric columns only.

max_levels

maximum number of levels to onehot encode per factor variable. Factors with levels exceeding this number will be skipped.

add_NA_factors

if TRUE, adds NA indicator column for factors.

Details

By default, with addNA=FALSE, no NAs are returned for non-factor columns. Indicator columns are created for factor levels and NA factors are ignored. The exception is when NA is an explicit factor level.

stringsAsFactrs=TRUE will convert character columns to factors first. Other wise characters are ignored. Only factor, numeric, integer, and logical vectors are valid for onehot. Other classes will be skipped entirely.

addNA=TRUE will create indicator columns for every field. This will add ncols columns to the output matrix. A sparse matrix may be better in such cases.

Value

a onehot object descrbing how to transform the data

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(iris)
encoder <- onehot(iris)

## add NA indicator columns
encoder <- onehot(iris, add_NA_factors=TRUE)

## limit which factors are onehot encoded
encoder <- onehot(iris, max_levels=5)

## Impute numeric NA values with sentinel value
encoder <- onehot(iris, sentinel=-1)

Zelazny7/onehot documentation built on May 6, 2019, 1:30 a.m.