View source: R/HelperFunctions.R
| create_onehot | R Documentation |
Converts a categorical vector into a one-hot encoded matrix where each unique value becomes a binary column.
create_onehot(x)
x |
A vector containing categorical values (factors, character, etc.) |
The function creates dummy variables for each unique value in the input vector using
model.matrix() with dummy-intercept coding. Column names are cleaned by removing the
'x' prefix added by model.matrix().
A data frame containing the one-hot encoded binary columns with cleaned column names
## lgspline will not accept this format of "catvar", because inputting data
# this way can cause difficult-to-diagnose issues in formula parsing
# all variables must be numeric
df <- data.frame(numvar = rnorm(100),
catvar = rep(LETTERS[1:4],
25))
print(head(df))
## Instead, replace with dummy-intercept coding by
# 1) applying one-hot encoding
# 2) dropping the first column
# 3) appending to our data
dummy_intercept_coding <- create_onehot(df$catvar)[,-1]
df$catvar <- NULL
df <- cbind(df, dummy_intercept_coding)
print(head(df))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.