efs_eval: Evaluation of Ensemble Features Selection
In EFS: Tool for Ensemble Feature Selection

Description Usage Arguments Details Value Author(s) See Also Examples

Provides several evaluation tests of the ouput of ensemble_fs. There are performance test, namely the logreg test and permutation test as well as tests of stability via the variance of feature importances and the Jaccard-index (see Details).

1
2
3

efs_eval(data, efs_table, file_name, classnumber, NA_threshold, logreg = TRUE,
  rf = TRUE, permutation = TRUE, p_num = 100, variances = TRUE,
  jaccard = TRUE, bs_num = 100, bs_percentage = 0.9)

`data`	an object of class data.frame
`efs_table`	a table object of class matrix (retrieved from `ensemble_fs`)
`file_name`	a character string, name which is used for the two possible PDF files.
`classnumber`	a number indicating the index of variable for binary classification
`NA_threshold`	a number in range of [0,1]. Threshold for deletion of features with a greater proportion of NAs than `NA_threshold`.
`logreg`	a logical value indicating whether to conduct an evaluation via logistic regression or not
`rf`	a logical value indicating whether to conduct an evaluation via random forest or not
`permutation`	a logical value indicating whether to conduct a permutation of the class variable or not
`p_num`	number of permutations
`variances`	a logical value indicating whether to calculate the variances of importances retrieved from bootrapping or not
`jaccard`	a logical value indicating whether to calculate the jaccard-index or not
`bs_num`	a number of boostrap permutations of the importances
`bs_percentage`	a number in range of [0,1]. Proportion of randomly selected samples for boostraping

A logistic regression model with leave-one-out cross-validation (LOOCV) of the selected features and of all feature is conducted by logreg = TRUE. Both AUC-values of the ROC curves are compared with roc.test. The ROC curves are illustrated on the PDF file "file_name" + "LG-ROC.pdf".
By rf = TRUE, random forst model will be constructed and evaluated. Parallel to Logreg, the AUC-values of the two ROC curves of all features and a subset of the best ranked feautres are compared with roc.test. The ROC curves are illustrated on the PDF file "file_name" + "RF-ROC.pdf".

The permutation test (permutation = TRUE) compares the AUC outcome of an logistic regression with p_num AUCs from random permutations of the class variable by a t.test.

Variances of the importances after a bootstrapping analysis are calculated by variances = TRUE. Thereby the number and proportion of the bootstrapping can be set by bs_num and bs_percentage. The function also provides a PDF file "file_name" +"_Variances.pdf". Additionally, the Jaccard-index of this bootstrapped importances can be calculated by setting jaccard=TRUE.

An object of class list, with the following components:
"AUC of LR with all parameters",
"AUC of LR with EFS parameter"
"P-value of LR-ROC test", #'
"AUC of RF with all parameters",
"AUC of RF with EFS parameter"
"P-value of RF-ROC test",
"P-value of permutation",
"Variances of feature importances",
"Jaccard-index".

Ursula Neumann

glm, roc,prediction, boxplot, tail, t.test

 ## Loading dataset in environment
 data(efsdata)
 ## Generate a ranking based on importance (with default
 ## NA_threshold = 0.7,cor_threshold = 0.2)
 efs<-ensemble_fs(efsdata,5,runs=2)
 ## Conduct AUC test and permutation test
 eval_example <- efs_eval(data = efsdata, efs_table = efs, file_name = 'eval_test', 
                      classnumber = 5, NA_threshold = 0.2,
                      logreg = TRUE,
                      rf = FALSE,
                      permutation = TRUE, p_num = 2, 
                      variances = FALSE, jaccard = FALSE)
## Calculating variances and the Jaccard-index can take several minutes computation time

[1] "default value for NA_threshold = 0.2"
[1] "default value for cor_threshold = 0.7"
[1] "default value for selection is c(TRUE, TRUE, TRUE,TRUE, TRUE, TRUE, FALSE, FALSE)"
[1] "Start Median"
[1] "Start Pearson"
[1] "Start Spearman"
[1] "Start LogReg"
[1] "Start RF"
[1] 1
Time difference of 0.02458167 secs
[1] 2
Time difference of 0.2683663 secs
[1] "Build return matrix"
[1] "Done"
Time difference of 0.3356316 secs
Warning messages:
1: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
2: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
3: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
4: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
5: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
[1] "default value for bs_num = 100"
[1] "default value for bs_percentage = 0.9"