View source: R/factor_encoder.R
factor.encoder | R Documentation |
factor.encoder()
creates an encoder function for a qualitative (factor or character) variable.
This encoder converts the variable into a one-hot encoded (dummy) design matrix.
factor.frame()
is a helper function to create a "factor.frame" object that defines the encoding scheme.
factor.encoder(
x,
k,
use.catchall = TRUE,
catchall = "(others)",
tag = "x",
frame = NULL,
weights = NULL
)
factor.frame(levels, catchall = "(others)", tag = "x")
x |
a vector to be encoded as a qualitative variable. |
k |
an integer specifying the maximum number of distinct levels to retain (including the catch-all level). If not positive, all unique values of |
use.catchall |
logical. If |
catchall |
a character string for the catch-all level. |
tag |
the name of the variable. |
frame |
a "factor.frame" object or a character vector that explicitly defines the levels of the variable. |
weights |
an optional numeric vector of sample weights for |
levels |
a vector to be used as the levels of the variable. |
This function is designed to handle qualitative data for use in the MID model's linear system formulation.
The primary mechanism is one-hot encoding.
Each unique level of the input variable becomes a column in the output matrix.
For a given observation, the column corresponding to its level is assigned a 1
, and all other columns are assigned 0
.
When a variable has many unique levels (high cardinality), you can use the use.catchall = TRUE
and k
arguments.
This will group the k - 1
most frequent levels into their own columns, while all other less frequent levels are consolidated into a single catchall
level (e.g., "(others)" by default).
This is crucial for preventing MID models from becoming overly complex.
factor.encoder()
returns an object of class "encoder". This is a list containing the following components:
frame |
a "factor.frame" object containing the encoding information (levels). |
encode |
a function to convert a vector |
n |
the number of encoding levels (i.e., columns in the design matrix). |
type |
a character string describing the encoding type: "factor" or "null". |
factor.frame()
returns a "factor.frame" object containing the encoding information.
numeric.encoder
# Create an encoder for a qualitative variable
data(iris, package = "datasets")
enc <- factor.encoder(x = iris$Species, use.catchall = FALSE, tag = "Species")
enc
# Encode a vector with NA
enc$encode(x = c("setosa", "virginica", "ensata", NA, "versicolor"))
# Create an encoder with a pre-defined encoding frame
frm <- factor.frame(c("setosa", "virginica"), "other iris")
enc <- factor.encoder(x = iris$Species, frame = frm)
enc
enc$encode(c("setosa", "virginica", "ensata", NA, "versicolor"))
# Create an encoder with a character vector specifying the levels
enc <- factor.encoder(x = iris$Species, frame = c("setosa", "versicolor"))
enc$encode(c("setosa", "virginica", "ensata", NA, "versicolor"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.