discretize: Unsupervized Data Discretization

Description Usage Arguments Value Author(s) References Examples

Description

discretize discretizes data using the equal frequencies or equal width binning algorithm. "equalwidth" and "equalfreq" discretizes each random variable (each column) of the data into nbins. "globalequalwidth" discretizes the range of the random vector data into nbins.

Usage

1
discretize( X, disc="equalfreq", nbins=NROW(X)^(1/3) )

Arguments

X

A data.frame containing data to be discretized. The columns contains variables and the rows samples.

disc

The name of the discretization method to be used :"equalfreq", "equalwidth" or "globalequalwidth" (default : "equalfreq") - see references.

nbins

Integer specifying the number of bins to be used for the discretization. By default the number of bins is set to (N)^(1/3) where N is the number of samples.

Value

discretize returns the discretized dataset.

Author(s)

Patrick E. Meyer, Frederic Lafitte, Gianluca Bontempi, Korbinian Strimmer

References

Meyer, P. E. (2008). Information-Theoretic Variable Selection and Network Inference from Microarray Data. PhD thesis of the Universite Libre de Bruxelles.

Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In International Conference on Machine Learning.

Yang, Y. and Webb, G. I. (2003). Discretization for naive-bayes learning: managing discretization bias and variance. Technical Report 2003/131 School of Computer Science and Software Engineering, Monash University.

Examples

1
2
3
4
5
data(USArrests)
nbins<- sqrt(NROW(USArrests))
ew.data <- discretize(USArrests,"equalwidth", nbins)
ef.data <- discretize(USArrests,"equalfreq", nbins)
gew.data <- discretize(USArrests,"globalequalwidth", nbins)

Gibbsdavidl/perminfotheo documentation built on May 6, 2019, 6:29 p.m.