View source: R/group_category.r
group_category | R Documentation |
Sometimes discrete features have sparse categories. This function will group the sparse categories for a discrete feature based on a given threshold.
group_category( data, feature, threshold, measure, update = FALSE, category_name = "OTHER", exclude = NULL )
data |
input data |
feature |
name of the discrete feature to be collapsed. |
threshold |
the bottom x% categories to be grouped, e.g., if set to 20%, categories with cumulative frequency of the bottom 20% will be grouped |
measure |
name of feature to be used as an alternative measure. |
update |
logical, indicating if the data should be modified. The default is |
category_name |
name of the new category if update is set to |
exclude |
categories to be excluded from grouping when update is set to |
If a continuous feature is passed to the argument feature
, it will be force set to character-class.
If update
is set to FALSE
, returns categories with cumulative frequency less than the input threshold. The output class will match the class of input data.
If update
is set to TRUE
, updated data will be returned, and the output class will match the class of input data.
# Load packages library(data.table) # Generate data data <- data.table("a" = as.factor(round(rnorm(500, 10, 5))), "b" = rexp(500, 500)) # View cumulative frequency without collpasing categories group_category(data, "a", 0.2) # View cumulative frequency based on another measure group_category(data, "a", 0.2, measure = "b") # Group bottom 20% categories based on cumulative frequency group_category(data, "a", 0.2, update = TRUE) plot_bar(data) # Exclude categories from being grouped dt <- data.table("a" = c(rep("c1", 25), rep("c2", 10), "c3", "c4")) group_category(dt, "a", 0.8, update = TRUE, exclude = c("c3", "c4")) plot_bar(dt) # Return from non-data.table input df <- data.frame("a" = as.factor(round(rnorm(50, 10, 5))), "b" = rexp(50, 10)) group_category(df, "a", 0.2) group_category(df, "a", 0.2, measure = "b", update = TRUE) group_category(df, "a", 0.2, update = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.