Home

/

CRAN

/

varrank

/

discretization: Discretization of a Possibly Continuous Data Frame of Random...

discretization: Discretization of a Possibly Continuous Data Frame of Random...
In varrank: Heuristics Tools Based on Mutual Information for Variable Ranking

View source: R/infotheo-toolbox.R

discretization

R Documentation

Discretization of a Possibly Continuous Data Frame of Random Variables

Description

This function discretizes data frame of possibly continuous random variables through rules for discretization. The discretization algorithms are unsupervised and univariate. See details for the complete list (the desired number of bins could also be provided).

Usage

discretization(data.df = NULL, discretization.method = "cencov", frequency = FALSE)

Arguments

`data.df`	a data frame containing the data to discretize, binary variables must be declared as factors, other as numeric vector. The data frame must be named.
`discretization.method`	a character vector giving the discretization method to use; see details. If a number is provided, the variable will be discretized by equal binning.
`frequency`	logical variable to select the output. If set to TRUE a list with the table of count for each bin and the discretized data frame is returned. If set to FALSE only the discretized data frame is returned.

Details

discretization() supports multiple rules for discretization. Below is the list of supported rules. IQR() stands for interquartile range.

fd stands for the Freedman–Diaconis rule. The number of bins is given by

range(x) * n^{1/3} / 2 * IQR(x)

The Freedman–Diaconis rule is known to be less sensitive than the Scott's rule to outlier.

doane stands for doane's rule. The number of bins is given by

1 + \log_{2}{n} + \log_{2}{1+\frac{|g|}{σ_{g}}}

is a modification of Sturges' formula which attempts to improve its performance with non-normal data.

cencov stands for Cencov's rule. The number of bins is given by:

n^{1/3}

rice stands for Rice' rule. The number of bins is given by:

2 n^{1/3}

terrell-scott stands for Terrell-Scott's rule. The number of bins is given by:

(2 n)^{1/3}

This is known that Cencov, Rice and Terrell-Scott rules over estimates k compared to other rules due to his simplicity.

sturges stands for Sturges's rule. The number of bins is given by:

1 + \log_2(n)

scott stands for Scott's rule. The number of bins is given by:

range(x) / σ(x) n^{-1/3}

kmeans apply the classical k-means clustering to one-dimensional continuous data.

Value

the discretized dataframe or a list containing the table of counts for each bin the discretized dataframe

Author(s)

Gilles Kratzer

References

Garcia, S., et al. (2013) A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25.4, 734-750.

Cebeci, Z. and Yıldız, F. (2017) Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset. Turkish Journal of Agriculture-Food Science and Technology, 5.4, 315-320.

Examples

rv <- rnorm(n = 100, mean = 0, sd = 2)

entropy.data(freqs.table = discretization(data.df = rv,
discretization.method = "fd",
frequency=TRUE)[[1]])

varrank documentation built on Oct. 12, 2022, 5:06 p.m.

varrank index

varrank: An R Package for Variable Ranking Based on Mutual Information with Applications to Systems Epidemiology

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

varrank
Heuristics Tools Based on Mutual Information for Variable Ranking

discretization: Discretization of a Possibly Continuous Data Frame of Random...
In varrank: Heuristics Tools Based on Mutual Information for Variable Ranking

Discretization of a Possibly Continuous Data Frame of Random Variables

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to discretization in varrank...

R Package Documentation

Browse R Packages

We want your feedback!

varrank Heuristics Tools Based on Mutual Information for Variable Ranking

discretization: Discretization of a Possibly Continuous Data Frame of Random... In varrank: Heuristics Tools Based on Mutual Information for Variable Ranking

Discretization of a Possibly Continuous Data Frame of Random Variables

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to discretization in varrank...

R Package Documentation

Browse R Packages

We want your feedback!

varrank
Heuristics Tools Based on Mutual Information for Variable Ranking

discretization: Discretization of a Possibly Continuous Data Frame of Random...
In varrank: Heuristics Tools Based on Mutual Information for Variable Ranking