replace_rare_levels: Replaces rare levels of a categorical variable

View source: R/replace_rare_levels.R

replace_rare_levelsR Documentation

Replaces rare levels of a categorical variable

Description

This function takes a categorical variable and replaces all levels with frequencies less than or equal to a user-specified threshold named Other

Usage

replace_rare_levels(x,threshold=20,newname="Other")

Arguments

x

a vector of categorical values

threshold

levels that appear a total of threshold times or fewer will be combined into a new level called Other

newname

defaults to Other, but give the option as to what this new level will be called

Details

Returns the recoded values of the categorical variable. All levels which appeared threshold times or fewer are now known as Other

If, after being combined, the newname level has threshold or fewer instances, the remaining level that appears least often is combined as well.

Author(s)

Adam Petrie

References

Introduction to Regression and Modeling

Examples

	data(EX6.CLICK)
	x <- EX6.CLICK[,15]
	table(x)
	
	#Replace all levels which appear 700 or fewer times (AA, CC, DD)
	y <- replace_rare_levels(x,700)
  table( y )
  
  #Replace all levels which appear 1350 or fewer times.  This forces BB (which
  #occurs 2422 times) into the Other level since the three levels that appear
  #fewer than 1350 times do not appear more than 1350 times combined
	y <- replace_rare_levels(x,1350)
  table( y )


regclass documentation built on June 8, 2025, 12:40 p.m.