select.cfs: Select the subset of features

Description Usage Arguments Details Value References See Also Examples

Description

This function selects the subset of features using the best first search strategy on the basis of correlation measure (CFS). CFS evaluates a subset of features by considering the individual predictive ability of each feature along with the degree of redundancy between them. It can handle both numerical and nominal values. The results is in the form of “data.frame”, consisting of the following fields: features (Biomarker) names and the positions of the features in the dataset. This function is used internally to perform the classification with feature selection using the function “classifier.loop” with argument “CFS” for feature selection. The variable “Index” of the data.frame is passed to the classification function.

Usage

1

Arguments

matrix

a dataset, a matrix of feature values for several cases, the last column is for the class labels. Class labels could be numerical or character values. The maximal number of classes is ten.

Details

This function's main job is to select the subset of informative features according to best first search strategy using the correlation measure (informative theoretic measure). The measure consideres the individual predictive ability of each feature along with the degree of redundancy between them. See the “Value” section to this page for more details.

Data can be provided in matrix form, where the rows correspond to cases with feature values and class label. The columns contain the values of individual features and the last column must contain class labels. The maximal number of class labels equals 10. The class label features and all the nominal features must be defined as factors.

Value

The data can be provided with reasonable number of missing values that must be at first preprocessed with one of the imputing methods in the function input_miss. A returned list consists of the the following fields:

Biomarker

a character vector of feature names

Index

a numerical vector of the positions of the features in the dataset

References

Y. Wang, I.V. Tetko, M.A. Hall, E. Frank, A. Facius, K.F.X. Mayer, and H.W. Mewes, "Gene Selection from Microarray Data for Cancer Classification—A Machine Learning Approach," Computational Biology and Chemistry, vol. 29, no. 1, pp. 37-46, 2005.

See Also

input_miss, select.process

Examples

1
2
3
4
5
6
# example for dataset without missing values
data(data_test)

# class label must be factor
data_test[,ncol(data_test)]<-as.factor(data_test[,ncol(data_test)])
out=select.cfs(matrix=data_test)

Example output

Loading required package: gtools
Loading required package: Rcpp
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE 
3: .onUnload failed in unloadNamespace() for 'rgl', details:
  call: fun(...)
  error: object 'rgl_quit' not found 

Biocomb documentation built on May 1, 2019, 9:38 p.m.