dummyEncode: dummyEncode

Description Usage Arguments Value

View source: R/encodings.R

Description

Collects data about levels and user preferences to transform dataset into dummy variables.

Usage

1
2
3
4
5
6
## S3 method for class 'dummyEncode'
predict(object, df, inPlace = FALSE, ...)

dummyEncode(dt, vars, treatNA = c("newLevel", "ghost"), sep = ".",
  setNA = "na", fullRank = TRUE, levelCountThresh = 50,
  values = c(0, 1))

Arguments

object

A dummyEncode object

inPlace

Return the entire df with the transformed vars, or just the transformed vars

dt

data.frame(table) to create the object on

vars

vector of variables you want to dummify

treatNA

A string that specifies what you want to do with NA values. It is basically never a good idea to have NA dummy variables, so that is not an option. Options are:

  • "newLevel" Simply creates a new level, which will be set to 1 when the variable is NA

  • "ghost" Sets all levels to 0. The information that this variable was NA is encoded into the data by the fact that none of the other levels are equal to 1. Bad idea for linear models.

sep

The seperator between variable and level in the new column names. "." is safe usually.

setNA

What to replace NA with if treatNA = "newLevel".

fullRank

Boolean. Copies carat syntax. If TRUE, the least common level is dropped so that linear models won't return blown up (yet valid) coefficients. A good conversation: https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding

levelCountThresh

Did you try to one-hot encode a floating point column? If your level count exceeds this value, the process stops and let you know which column it was.

values

What to encode the values as. Should be a numeric vector of the form c(False value,Positive value). Default one-hot encoding values are c(0,1). Sometimes, it is useful to encode as c(-1,1) for certain NN activation functions.

Value

Frequency Encoded Object. This needs to be applied. It will not actually return a dataset.


AnotherSamWilson/helperFuncs documentation built on Oct. 1, 2019, 8:51 p.m.