Description Usage Arguments Details Value References See Also Examples
The auxiliary function performs the discretization of the numerical features and is called from the several functions for feature selection. The discretization options include minimal description length (MDL), equal frequency and equal interval width methods. The results is in the form of “list”, consisting of two fields: the processed dataset and the column numbers of the features. When the value of the input parameter “flag”=TRUE the second field will include the column numbers of the features, which have more than single interval after discretization.
1 | ProcessData(matrix,disc.method,attrs.nominal,flag=FALSE)
|
matrix |
a dataset, a matrix of feature values for several cases, the last column is for the class labels. Class labels could be numerical or character values. The maximal number of classes is ten. |
disc.method |
a method used for feature discretization.The discretization options include minimal description length (MDL), equal frequency and equal interval width methods. |
attrs.nominal |
a numerical vector, containing the column numbers of the nominal features, selected for the analysis. |
flag |
a binary logical value. If equals TRUE the output list will contain the processed dataset with the features, having more than one interval after discretization together with their names. In the case of FALSE value the processed dataset with all the features will be returned. |
This auxiliary function's main job is to descritize the numerical features using the one of the discretization methods. See the “Value” section to this page for more details.
Data can be provided in matrix form, where the rows correspond to cases with feature values and class label. The columns contain the values of individual features and the last column must contain class labels. The maximal number of class labels equals 10. The class label features and all the nominal features must be defined as factors.
The data can be provided with reasonable number of missing values that must be at first preprocessed with one of the imputing methods in the function input_miss
.
A returned list consists of the the following fields:
m3 |
a processed dataset |
sel.feature |
a numeric vector with the column numbers of the features, having more than one interval value (when “flag”=TRUE). If “flag”=FALSE it return all the column numbers of the dataset. |
H. Liu, F. Hussain, C. L. Tan, and M. Dash, "Discretization: An enabling technique," Data Mining and Knowledge Discovery, Vol. 6, No. 4, 2002, pp. 393-423.
select.inf.gain
, select.inf.symm
, select.inf.chi2
,
select.fast.filter
, select.process
1 2 3 4 5 6 7 8 9 10 11 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.