binning: Binning the Numeric Data
In dlookr: Tools for Data Diagnosis, Exploration, Transformation

binning

R Documentation

Binning the Numeric Data

Description

The binning() converts a numeric variable to a categorization variable.

Usage

binning(
  x,
  nbins,
  type = c("quantile", "equal", "pretty", "kmeans", "bclust"),
  ordered = TRUE,
  labels = NULL,
  approxy.lab = TRUE
)

Arguments

`x`	numeric. numeric vector for binning.
`nbins`	integer. number of intervals(bins). required. if missing, nclass.Sturges is used.
`type`	character. binning method. Choose from "quantile", "equal", "pretty", "kmeans" and "bclust". The "quantile" sets breaks with quantiles of the same interval. The "equal" sets breaks at the same interval. The "pretty" chooses a number of breaks not necessarily equal to nbins using base::pretty function. The "kmeans" uses stats::kmeans function to generate the breaks. The "bclust" uses e1071::bclust function to generate the breaks using bagged clustering. "kmeans" and "bclust" was implemented by classInt::classIntervals() function.
`ordered`	logical. whether to build an ordered factor or not.
`labels`	character. the label names to use for each of the bins.
`approxy.lab`	logical. If TRUE, large number breaks are approximated to pretty numbers. If FALSE, the original breaks obtained by type are used.

Details

This function is useful when used with the mutate/transmute function of the dplyr package.

See vignette("transformation") for an introduction to these concepts.

Value

An object of bins class. Attributes of bins class is as follows.

class : "bins"
type : binning type, "quantile", "equal", "pretty", "kmeans", "bclust".
breaks : breaks for binning. the number of intervals into which x is to be cut.
levels : levels of binned value.
raw : raw data, numeric vector corresponding to x argument.

Examples


# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 20), "platelets"] <- NA

# Binning the platelets variable. default type argument is "quantile"
bin <- binning(heartfailure2$platelets)
# Print bins class object
bin

# Using labels argument
bin <- binning(heartfailure2$platelets, nbins = 4,
              labels = c("LQ1", "UQ1", "LQ3", "UQ3"))
bin

# Using another type argument
bin <- binning(heartfailure2$platelets, nbins = 5, type = "equal")
bin
bin <- binning(heartfailure2$platelets, nbins = 5, type = "pretty")
bin
# "kmeans" and "bclust" was implemented by classInt::classIntervals() function.
# So, you must install classInt package.
if (requireNamespace("classInt", quietly = TRUE)) {
  bin <- binning(heartfailure2$platelets, nbins = 5, type = "kmeans")
  bin
  bin <- binning(heartfailure2$platelets, nbins = 5, type = "bclust")
  bin
} else {
  cat("If you want to use this feature, you need to install the 'classInt' package.\n")
}

x <- sample(1:1000, size = 50) * 12345679
bin <- binning(x)
bin
bin <- binning(x, approxy.lab = FALSE)
bin

# extract binned results
extract(bin)

# -------------------------
# Using pipes & dplyr
# -------------------------
library(dplyr)

# Compare binned frequency by death_event
heartfailure2 %>%
  mutate(platelets_bin = binning(heartfailure2$platelets) %>% 
           extract()) %>%
  group_by(death_event, platelets_bin) %>%
  summarise(freq = n(), .groups = "drop") %>%
  arrange(desc(freq)) %>%
  head(10)

dlookr documentation built on May 29, 2024, 2 a.m.