dummyEncode: dummyEncode
In AnotherSamWilson/helperFuncs: Helper Functions for common manipuations

Description Usage Arguments Value

View source: R/encodings.R

Collects data about levels and user preferences to transform dataset into dummy variables.

## S3 method for class 'dummyEncode'
predict(object, df, inPlace = FALSE, ...)

dummyEncode(dt, vars, treatNA = c("newLevel", "ghost"), sep = ".",
  setNA = "na", fullRank = TRUE, levelCountThresh = 50,
  values = c(0, 1))

`object`	A dummyEncode object
`inPlace`	Return the entire df with the transformed vars, or just the transformed vars
`dt`	data.frame(table) to create the object on
`vars`	vector of variables you want to dummify
`treatNA`	A string that specifies what you want to do with NA values. It is basically never a good idea to have NA dummy variables, so that is not an option. Options are: `"newLevel"` Simply creates a new level, which will be set to 1 when the variable is NA `"ghost"` Sets all levels to 0. The information that this variable was NA is encoded into the data by the fact that none of the other levels are equal to 1. Bad idea for linear models.
`sep`	The seperator between variable and level in the new column names. "." is safe usually.
`setNA`	What to replace NA with if `treatNA = "newLevel"`.
`fullRank`	Boolean. Copies carat syntax. If TRUE, the least common level is dropped so that linear models won't return blown up (yet valid) coefficients. A good conversation: https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding
`levelCountThresh`	Did you try to one-hot encode a floating point column? If your level count exceeds this value, the process stops and let you know which column it was.
`values`	What to encode the values as. Should be a numeric vector of the form c(False value,Positive value). Default one-hot encoding values are c(0,1). Sometimes, it is useful to encode as c(-1,1) for certain NN activation functions.