Description Usage Arguments Details Value Examples
Bins categorical variables into a smaller number of bins. Useful when modeling with variables that have many small categories.
The largest categories are taken as is and the smaller categories are collapsed into a new field named 'other.'
There are two options for determining the number of bins:
1. Specify the exact number of bins desired (ncat
)
2. Specify how the share of your variable that will be represented with actual categories before naming everything else 'other' (maxp
)
1 2 |
x |
vector to bin. It is transformed to a character, so any type is acceptable |
ncat |
number 0 to 100 (or higher I suppose). Number of bins to collapse data to |
maxp |
number 0 to 1. Percentage of data that will be represented "as is" before categories are collapsed to "other" |
results |
logical |
setNA |
value to set NAs to. default is to keep NA. Can set to a character string to make NAs a category |
keepNA |
logical. |
It is advisable to use only the ncat
OR maxp
parameters. When both used together, they will return whichever
criteria yields the smaller number of bins.
Possible unexpected behavior when setNA=NA and keepNA=T. To keep NAs as standalone category, need to make setNA something that is not NA.
vector of binned data
1 2 3 4 5 6 7 8 9 10 11 12 | d <- rpois(1000, 20)
d[d>26] <- sample(1:26, length(d[d>26]), replace=T)
dl <- letters[d]
barplot(table(dl))
table(binCat(dl, results=F, ncat=5))
table(binCat(dl, results=F, maxp=0.5))
table(binCat(dl, results=F, maxp=0.9))
## With missings
ff <- sample(letters[1:15], 100, replace=T)
ff[sample(100, 10)] <- NA
binCat(ff, ncat=7, setNA='missing')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.