encode_freq: encode_freq
In gloverd2/admr: Contains standard functions for Admiral Advance Analytics

Description Usage Arguments Value Examples

This function takes one or more categorical variables and tranforms transforms them using frequency encoding. The first level (1) is Unknown values, the second level (2) is the most popular level, third (3) the second most popular level. Small levels can be grouped together using the n_level flag. These are placed at the end

encode_freq(
  data,
  n_levels = NULL,
  min_level_count = NULL,
  unknown_levels = NULL,
  unknown_treatment_method = 1
)

`data`	vector or dataframe - data to frequency encode. For dataframe columns are encoded
`n_levels`	numeric or named list of numerics - This is the maximun number of categorical levels to include (note "Unknown" and "Other" are always added). If a named list is given the names must be the same as the dataframe colnames
`min_level_count`	numeric or named list of numerics - This in the minimum number of instances for a level to be counted (This is stricter than a given value for n_levels) If a named list is given the names must be the same as the dataframe colnames
`unknown_levels`	vector[String] or named list of vector[Sting] - These values will be treated as unknown. NA and "" are always treated as unknown If a named list is given the names must be the same as the dataframe colnames
`unknown_treatment_method`	numeric or named list of numerics - Must be 1. Gives option to treat unknowns differently. Not implememted If a named list is given the names must be the same as the dataframe colnames

list(data, levels) - data is transformed data in the same shape as the input data levels is a vector (if data is vector) or named list of vectors (if data is dataframe) containing the order of the categorical levels

data_in <- c(rep("cat", 2) , rep("dog", 3), rep("fish", 4), "llama", NA)
encode_freq(data=data_in)

data_in_df <- data.frame(
pet=c(rep("cat", 2) , rep("dog", 3), rep("fish", 4), "llama", NA),
letter=c(rep("a",5), rep("b",5), "c")
)

encode_freq(data=data_in_df)