View source: R/replace_rare_levels.R
replace_rare_levels | R Documentation |
This function takes a categorical variable and replaces all levels with frequencies less than or equal to a user-specified threshold named Other
replace_rare_levels(x,threshold=20,newname="Other")
x |
a vector of categorical values |
threshold |
levels that appear a total of |
newname |
defaults to |
Returns the recoded values of the categorical variable. All levels which appeared threshold
times or fewer are now known as Other
If, after being combined, the newname
level has threshold
or fewer instances, the remaining level that appears least often is combined as well.
Adam Petrie
Introduction to Regression and Modeling
data(EX6.CLICK)
x <- EX6.CLICK[,15]
table(x)
#Replace all levels which appear 700 or fewer times (AA, CC, DD)
y <- replace_rare_levels(x,700)
table( y )
#Replace all levels which appear 1350 or fewer times. This forces BB (which
#occurs 2422 times) into the Other level since the three levels that appear
#fewer than 1350 times do not appear more than 1350 times combined
y <- replace_rare_levels(x,1350)
table( y )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.