discretize  R Documentation 
discretize()
converts a numeric vector into a factor with
bins having approximately the same number of data points (based
on a training set).
discretize(x, ...)
## Default S3 method:
discretize(x, ...)
## S3 method for class 'numeric'
discretize(
x,
cuts = 4,
labels = NULL,
prefix = "bin",
keep_na = TRUE,
infs = TRUE,
min_unique = 10,
...
)
## S3 method for class 'discretize'
predict(object, new_data, ...)
x 
A numeric vector 
... 
Options to pass to

cuts 
An integer defining how many cuts to make of the data. 
labels 
A character vector defining the factor levels
that will be in the new factor (from smallest to largest). This
should have length 
prefix 
A single parameter value to be used as a prefix
for the factor levels (e.g. 
keep_na 
A logical for whether a factor level should be
created to identify missing values in 
infs 
A logical indicating whether the smallest and largest cut point should be infinite. 
min_unique 
An integer defining a sample size line of
dignity for the binning. If (the number of unique
values) 
object 
An object of class 
new_data 
A new numeric object to be binned. 
discretize
estimates the cut points from
x
using percentiles. For example, if cuts = 3
, the
function estimates the quartiles of x
and uses these as
the cut points. If cuts = 2
, the bins are defined as
being above or below the median of x
.
The predict
method can then be used to turn numeric
vectors into factor vectors.
If keep_na = TRUE
, a suffix of "_missing" is used as a
factor level (see the examples below).
If infs = FALSE
and a new value is greater than the
largest value of x
, a missing value will result.
discretize
returns an object of class
discretize
and predict.discretize
returns a factor
vector.
data(biomass, package = "modeldata")
biomass_tr < biomass[biomass$dataset == "Training", ]
biomass_te < biomass[biomass$dataset == "Testing", ]
median(biomass_tr$carbon)
discretize(biomass_tr$carbon, cuts = 2)
discretize(biomass_tr$carbon, cuts = 2, infs = FALSE)
discretize(biomass_tr$carbon, cuts = 2, infs = FALSE, keep_na = FALSE)
discretize(biomass_tr$carbon, cuts = 2, prefix = "maybe a bad idea to bin")
carbon_binned < discretize(biomass_tr$carbon)
table(predict(carbon_binned, biomass_tr$carbon))
carbon_no_infs < discretize(biomass_tr$carbon, infs = FALSE)
predict(carbon_no_infs, c(50, 100))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.