other_label: Automatically assign "other" to low observation count...

Description Usage Arguments Value Examples

Description

Often times, a dataset will have groups which will see almost all rows fall into a few group values, but there are many smaller group values for the remaining observations. For example, you may have a dataset with employee level observations and want to use "US State" as a group, but 90% of the observations fall into New York, California, Texas, and perhaps 6 other states. All remaining observations are distributed amongst the remaining 41 states, but you might prefer to lump all of those observations into a single bucket. This functions provides a way to reassign all those observations to "other".

Usage

1
other_label(df, column, percentile = 0.9, custom = NULL)

Arguments

df

The dataframe to be manipulated

column

Which column to relabel

percentile

Which percentage to cut off the data at

custom

A custom vector of values to reassign to "other" in the dataset

Value

The dataframe with reassigned column

Examples

1
2
3
summary(as.factor(permits$type_desc))
permits_cleaned <- other_label(permits, "type_desc")
summary(as.factor(permits_cleaned$type_desc))

athompson1991/groupR documentation built on May 10, 2019, 2:09 p.m.