group_by_threshold: Group categories by (cumulative) frequencies

Description Usage Arguments Examples

View source: R/grouping_and_binning.R

Description

Group categories by (cumulative) frequencies

Usage

1
2
3
group_by_threshold(dt, feature, threshold = NULL, cum_threshold = NULL,
  no_of_categories = NULL, return_data = FALSE, modify = FALSE,
  other_cat_name = "OTHER")

Arguments

other_cat_name

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
library(mdmisc)
set.seed(2016)
dt <- data.table::data.table('cat' = sample(letters[1:10], 20, replace = TRUE), 'cont' = rnorm(20))

## View cumulative frequencies
group_by_threshold(dt, 'cat')
group_by_threshold(dt, 'cat', threshold = 0.10)
group_by_threshold(dt, 'cat', cum_threshold = 0.90)
group_by_threshold(dt, 'cat', no_of_categories = 3)

## Group categories below 10\% of frequency
dt_mod <- group_by_threshold(dt, 'cat', threshold = 0.10, return_data = TRUE)
group_by_threshold(dt_mod, 'cat')

## Group bottom 10\% categories based on cumulative frequency
dt_mod <- group_by_threshold(dt, 'cat', cum_threshold = 0.90, return_data = TRUE)
group_by_threshold(dt_mod, 'cat')

## Leave 3 categories based on frequency
dt_mod <- group_by_threshold(dt, 'cat', no_of_categories = 3, return_data = TRUE)
group_by_threshold(dt_mod, 'cat')

## Group and modify in place
group_by_threshold(dt, 'cat', threshold = 0.1, return_data = TRUE, modify = TRUE)
group_by_threshold(dt, 'cat')

m-dz/mdmisc documentation built on May 22, 2019, 12:23 p.m.