binCat: categorical data binning by collapsing
In brooksandrew/Rsenal: Rsenal (arsenal) of assorted R functions developed over the years

Description Usage Arguments Details Value Examples

Bins categorical variables into a smaller number of bins. Useful when modeling with variables that have many small categories. The largest categories are taken as is and the smaller categories are collapsed into a new field named 'other.' There are two options for determining the number of bins:
1. Specify the exact number of bins desired (ncat)
2. Specify how the share of your variable that will be represented with actual categories before naming everything else 'other' (maxp)

1 2	binCat(x, ncat = NULL, maxp = NULL, results = F, setNA = NA, keepNA = F)

`x`	vector to bin. It is transformed to a character, so any type is acceptable
`ncat`	number 0 to 100 (or higher I suppose). Number of bins to collapse data to
`maxp`	number 0 to 1. Percentage of data that will be represented "as is" before categories are collapsed to "other"
`results`	logical `TRUE` or `FALSE`. Prints a frequency table of the new categories.
`setNA`	value to set NAs to. default is to keep NA. Can set to a character string to make NAs a category
`keepNA`	logical. `TRUE` keeps NAs as their own character. `FALSE` bundles NAs into 'other' category.

It is advisable to use only the ncat OR maxp parameters. When both used together, they will return whichever criteria yields the smaller number of bins.
Possible unexpected behavior when setNA=NA and keepNA=T. To keep NAs as standalone category, need to make setNA something that is not NA.

vector of binned data

d <- rpois(1000, 20)
d[d>26] <- sample(1:26, length(d[d>26]), replace=T)
dl <- letters[d]
barplot(table(dl))
table(binCat(dl, results=F, ncat=5))
table(binCat(dl, results=F, maxp=0.5))
table(binCat(dl, results=F, maxp=0.9))

## With missings
ff <- sample(letters[1:15], 100, replace=T)
ff[sample(100, 10)] <- NA
binCat(ff, ncat=7, setNA='missing')

brooksandrew/Rsenal documentation built on May 13, 2019, 7:50 a.m.

brooksandrew/Rsenal index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

brooksandrew/Rsenal
Rsenal (arsenal) of assorted R functions developed over the years

binCat: categorical data binning by collapsing
In brooksandrew/Rsenal: Rsenal (arsenal) of assorted R functions developed over the years

Description

Usage

Arguments

Details

Value

Examples

Related to binCat in brooksandrew/Rsenal...

R Package Documentation

Browse R Packages

We want your feedback!

brooksandrew/Rsenal Rsenal (arsenal) of assorted R functions developed over the years

binCat: categorical data binning by collapsing In brooksandrew/Rsenal: Rsenal (arsenal) of assorted R functions developed over the years

Description

Usage

Arguments

Details

Value

Examples

Related to binCat in brooksandrew/Rsenal...

R Package Documentation

Browse R Packages

We want your feedback!

brooksandrew/Rsenal
Rsenal (arsenal) of assorted R functions developed over the years

binCat: categorical data binning by collapsing
In brooksandrew/Rsenal: Rsenal (arsenal) of assorted R functions developed over the years