ProcessData: Select the subset of features
In Biocomb: Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis

Description Usage Arguments Details Value References See Also Examples

The auxiliary function performs the discretization of the numerical features and is called from the several functions for feature selection. The discretization options include minimal description length (MDL), equal frequency and equal interval width methods. The results is in the form of “list”, consisting of two fields: the processed dataset and the column numbers of the features. When the value of the input parameter “flag”=TRUE the second field will include the column numbers of the features, which have more than single interval after discretization.

1	ProcessData(matrix,disc.method,attrs.nominal,flag=FALSE)

`matrix`	a dataset, a matrix of feature values for several cases, the last column is for the class labels. Class labels could be numerical or character values. The maximal number of classes is ten.
`disc.method`	a method used for feature discretization.The discretization options include minimal description length (MDL), equal frequency and equal interval width methods.
`attrs.nominal`	a numerical vector, containing the column numbers of the nominal features, selected for the analysis.
`flag`	a binary logical value. If equals TRUE the output list will contain the processed dataset with the features, having more than one interval after discretization together with their names. In the case of FALSE value the processed dataset with all the features will be returned.

This auxiliary function's main job is to descritize the numerical features using the one of the discretization methods. See the “Value” section to this page for more details.

Data can be provided in matrix form, where the rows correspond to cases with feature values and class label. The columns contain the values of individual features and the last column must contain class labels. The maximal number of class labels equals 10. The class label features and all the nominal features must be defined as factors.

The data can be provided with reasonable number of missing values that must be at first preprocessed with one of the imputing methods in the function input_miss.

A returned list consists of the the following fields:

`m3`	a processed dataset
`sel.feature`	a numeric vector with the column numbers of the features, having more than one interval value (when “flag”=TRUE). If “flag”=FALSE it return all the column numbers of the dataset.

H. Liu, F. Hussain, C. L. Tan, and M. Dash, "Discretization: An enabling technique," Data Mining and Knowledge Discovery, Vol. 6, No. 4, 2002, pp. 393-423.

select.inf.gain, select.inf.symm, select.inf.chi2,
select.fast.filter, select.process

# example for dataset without missing values
data(data_test)

# class label must be factor
data_test[,ncol(data_test)]<-as.factor(data_test[,ncol(data_test)])

disc<-"MDL"
attrs.nominal=numeric()
flag=FALSE
out=ProcessData(matrix=data_test,disc.method=disc,
attrs.nominal=attrs.nominal,flag=flag)

Biocomb documentation built on May 1, 2019, 9:38 p.m.

Biocomb index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Biocomb
Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis

ProcessData: Select the subset of features
In Biocomb: Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to ProcessData in Biocomb...

R Package Documentation

Browse R Packages

We want your feedback!

Biocomb Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis

ProcessData: Select the subset of features In Biocomb: Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to ProcessData in Biocomb...

R Package Documentation

Browse R Packages

We want your feedback!

Biocomb
Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis

ProcessData: Select the subset of features
In Biocomb: Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis