create_onehot: Create One-Hot Encoded Matrix

View source: R/HelperFunctions.R

create_onehotR Documentation

Create One-Hot Encoded Matrix

Description

Converts a categorical vector into a one-hot encoded matrix where each unique value becomes a binary column.

Usage

create_onehot(x, drop_first = FALSE)

Arguments

x

A vector containing categorical values (factors, character, etc.)

drop_first

Logical; if TRUE and more than one dummy column is created, drop the first column.

Details

The function creates dummy variables for each unique value in the input vector using model.matrix() with dummy-intercept coding. Column names are cleaned by removing the 'x' prefix added by model.matrix().

Value

A data frame containing the one-hot encoded binary columns with cleaned column names

Examples



## lgspline will not accept this format of "catvar", because inputting data
# this way can cause difficult-to-diagnose issues in formula parsing
# all variables must be numeric
df <- data.frame(numvar = rnorm(100),
                 catvar = rep(LETTERS[1:4],
                              25))
print(head(df))

## Instead, replace with dummy-intercept coding by
# 1) applying one-hot encoding
# 2) dropping the first column
# 3) appending to our data

dummy_intercept_coding <- create_onehot(df$catvar)[,-1]
df$catvar <- NULL
df <- cbind(df, dummy_intercept_coding)
print(head(df))




lgspline documentation built on May 8, 2026, 5:07 p.m.