Description Usage Arguments Value Examples
This function takes one or more categorical variables and tranforms transforms them using frequency encoding. The first level (1) is Unknown values, the second level (2) is the most popular level, third (3) the second most popular level. Small levels can be grouped together using the n_level flag. These are placed at the end
1 2 3 4 5 6 7 | encode_freq(
data,
n_levels = NULL,
min_level_count = NULL,
unknown_levels = NULL,
unknown_treatment_method = 1
)
|
data |
vector or dataframe - data to frequency encode. For dataframe columns are encoded |
n_levels |
numeric or named list of numerics - This is the maximun number of categorical levels to include (note "Unknown" and "Other" are always added). If a named list is given the names must be the same as the dataframe colnames |
min_level_count |
numeric or named list of numerics - This in the minimum number of instances for a level to be counted (This is stricter than a given value for n_levels) If a named list is given the names must be the same as the dataframe colnames |
unknown_levels |
vector[String] or named list of vector[Sting] - These values will be treated as unknown. NA and "" are always treated as unknown If a named list is given the names must be the same as the dataframe colnames |
unknown_treatment_method |
numeric or named list of numerics - Must be 1. Gives option to treat unknowns differently. Not implememted If a named list is given the names must be the same as the dataframe colnames |
list(data, levels) -
data is transformed data in the same shape as the input data
levels is a vector (if data
is vector) or named list of vectors (if data
is dataframe) containing the order of the categorical levels
1 2 3 4 5 6 7 8 9 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.