chi2 | R Documentation |
This function performs Chi2 discretization algorithm. Chi2 algorithm automatically determines a proper Chi-sqaure(χ^2) threshold that keeps the fidelity of the original numeric dataset.
chi2(data, alp = 0.5, del = 0.05)
data |
the dataset to be discretize |
alp |
significance level; α |
del |
Inconsistency(data)< δ, (Liu and Setiono(1995)) |
The Chi2 algorithm is based on the χ^2 statistic, and consists of two phases.
In the first phase, it begins with a high significance level(sigLevel), for all numeric attributes for discretization. Each attribute is sorted according to its values. Then the following is performed:
phase 1. calculate the χ^2 value for every pair of adjacent intervals (at the beginning, each pattern is put into its own interval that contains only one value of an attribute);
pahse 2. merge the pair of adjacent intervals with the lowest χ^2 value. Merging continues until all pairs of intervals have χ^2 values exceeding the parameter determined by sigLevel. The above process is repeated with a decreased sigLevel until an inconsistency rate(δ), incon()
, is exceeded in the discretized data(Liu and Setiono (1995)).
cutp |
list of cut-points for each variable |
Disc.data |
discretized data matrix |
HyunJi Kim polaris7867@gmail.com
Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388–391.
Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642–645.
value
,
incon
and
chiM
.
data(iris) #---cut-points chi2(iris,0.5,0.05)$cutp #--discretized dataset using Chi2 algorithm chi2(iris,0.5,0.05)$Disc.data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.