Home

/

CRAN

/

EFS

/

ensemble_fs: Ensemble Feature Selection

ensemble_fs: Ensemble Feature Selection
In EFS: Tool for Ensemble Feature Selection

Description Usage Arguments Details Value Author(s) References See Also Examples

Uses an ensemble of feature selection methods to create a normalized quantitative score of all relevant features. Irrelevant features (e.g. features with too many missing values or variance = 1) will be deleted. See Details for a list of tests used in this function.

1
2
3

ensemble_fs(data, classnumber, NA_threshold = 0.2, cor_threshold = 0.7,
  runs = 100, selection = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
  FALSE))

`data`	an object of class data.frame
`classnumber`	a number indicating the index of variable for binary classification
`NA_threshold`	a number in range of [0,1]. Threshold for deletion of features with a greater proportion of NAs than `NA_threshold`.
`cor_threshold`	a number used only for Spearman and Pearson correlation. Correlation threshold within features. If the correlation of 2 features is greater than `cor_threshold` the dependent feature is deleted.
`runs`	a number used only for randomForest and cforest. Amount of runs to gain higher robustness.
`selection`	a vector of length eight with TRUE or FALSE values. Selection of feature selection methods to be conducted.

Following methods are provided in the ensemble_fs:

Median: p-values from Wilcoxon signed-rank test (wilcox.test)
Spearman: Spearman's rank correlation test arccording to Yu et al. (2004) (cor)
Pearson: Pearson's product moment correlation test arccording to Yu et al. (2004) (cor)
LogReg: beta-Values of logistic regression (glm)
Accuracy//Error-rate randomForest: Error-rate-based variable importance measure embedded in randomForest according to Breiman (2001) (randomForest)
Gini randomForest: Gini-index-based variable importance measure embedded in randomForest according to Breiman (2001) (randomForest)
Error-rate cforest: Error-rate-based variable importance measure embedded in cforest according Strobl et al. (2009) (cforest)
AUC cforest: AUC-based variable importance measure embedded in cforest according to Janitza et al. (2013) (cforest)

By the argument selection the user decides which feature selection methods are used in ensemble_fs. Default value is selection = c(TRUE, TRUE, TRUE,TRUE, TRUE, TRUE, FALSE, FALSE), i.e., the function does not use either of the cforest variable importance measures. The maximum score for features depends on the input of selection. The scores are always divided through the amount of selected feature selection, respectively the amount of TRUEs.

table of normalized importance values of class matrix (used methods as rows and features of the imported file as columns).

Ursula Neumann

Yu, L. and Liu H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 2004, 5:1205-1224.
Breiman, L.: Random Forests, Machine Learning. 2001, 45(1): 5-32.
Strobl, C., Malley, J. anpercentaged Tutz, G.: An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random forests. Psychological Methods. 2009, 14(4), 323–348.
Janitza, S., Strobl, C. and Boulesteix AL.: An AUC-based Permutation Variable Importance Measure for Random Forests. BMC Bioinformatics.2013, 14, 119.

wilcox.test, randomForest, cforest, cor, glm

 ## Loading dataset in environment
 data(efsdata)
 ## Generate a ranking based on importance (with default NA_threshold = 0.2,
 ## cor_threshold = 0.7, selection = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE))
 efs <- ensemble_fs(efsdata, 5, runs=2)

[1] "default value for NA_threshold = 0.2"
[1] "default value for cor_threshold = 0.7"
[1] "default value for selection is c(TRUE, TRUE, TRUE,TRUE, TRUE, TRUE, FALSE, FALSE)"
[1] "Start Median"
[1] "Start Pearson"
[1] "Start Spearman"
[1] "Start LogReg"
[1] "Start RF"
[1] 1
Time difference of 0.02474976 secs
[1] 2
Time difference of 0.4320579 secs
[1] "Build return matrix"
[1] "Done"
Time difference of 0.5142276 secs
Warning messages:
1: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
2: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
3: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
4: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
5: In wilcox.test.default(x, y) : cannot compute exact p-value with ties

EFS documentation built on May 2, 2019, 9:58 a.m.

EFS index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EFS
Tool for Ensemble Feature Selection

ensemble_fs: Ensemble Feature Selection
In EFS: Tool for Ensemble Feature Selection

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Example output

Related to ensemble_fs in EFS...

R Package Documentation

Browse R Packages

We want your feedback!

EFS Tool for Ensemble Feature Selection

ensemble_fs: Ensemble Feature Selection In EFS: Tool for Ensemble Feature Selection

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Example output

Related to ensemble_fs in EFS...

R Package Documentation

Browse R Packages

We want your feedback!

EFS
Tool for Ensemble Feature Selection

ensemble_fs: Ensemble Feature Selection
In EFS: Tool for Ensemble Feature Selection