combine.levels: combine.levels

View source: R/combine.levels.r

combine.levelsR Documentation

combine.levels

Description

Combine Infrequent Levels of a Categorical Variable

Usage

combine.levels(
  x,
  minlev = 0.05,
  m,
  ord = is.ordered(x),
  plevels = FALSE,
  sep = ","
)

Arguments

x

a factor, 'ordered' factor, or numeric or character variable that will be turned into a 'factor'

minlev

the minimum proportion of observations in a cell before that cell is combined with one or more cells. If more than one cell has fewer than minlev*n observations, all such cells are combined into a new cell labeled '"OTHER"'. Otherwise, the lowest frequency cell is combined with the next lowest frequency cell, and the level name is the combination of the two old level levels. When 'ord=TRUE' combinations happen only for consecutive levels.

m

alternative to 'minlev', is the minimum number of observations in a cell before it will be combined with others

ord

set to 'TRUE' to treat 'x' as if it were an ordered factor, which allows only consecutive levels to be combined

plevels

by default 'combine.levels' pools low-frequency levels into a category named 'OTHER' when 'x' is not ordered and 'ord=FALSE'. To instead name this category the concatenation of all the pooled level names, separated by a comma, set 'plevels=TRUE'.

sep

the separator for concatenating levels when 'plevels=TRUE'

Details

After turning 'x' into a 'factor' if it is not one already, combines levels of 'x' whose frequency falls below a specified relative frequency 'minlev' or absolute count 'm'. When 'x' is not treated as ordered, all of the small frequency levels are combined into '"OTHER"', unless 'plevels=TRUE'. When 'ord=TRUE' or 'x' is an ordered factor, only consecutive levels are combined. New levels are constructed by concatenating the levels with 'sep' as a separator. This is useful when comparing ordinal regression with polytomous (multinomial) regression and there are too many categories for polytomous regression. 'combine.levels' is also useful when assumptions of ordinal models are being checked empirically by computing exceedance probabilities for various cutoffs of the dependent variable.

Value

a factor variable, or if 'ord=TRUE' an ordered factor variable

Author(s)

Frank Harrell

Examples

x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1))
combine.levels(x, m=3)
combine.levels(x, m=3, plevels=TRUE)
combine.levels(x, ord=TRUE, m=3)
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1),
       rep('F',1))
combine.levels(x, ord=TRUE, m=3)

harrelfe/Hmisc documentation built on April 18, 2024, 11:06 p.m.