efs_eval: Evaluation of Ensemble Features Selection

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Provides several evaluation tests of the ouput of ensemble_fs. There are performance test, namely the logreg test and permutation test as well as tests of stability via the variance of feature importances and the Jaccard-index (see Details).

Usage

1
2
3
efs_eval(data, efs_table, file_name, classnumber, NA_threshold, logreg = TRUE,
  rf = TRUE, permutation = TRUE, p_num = 100, variances = TRUE,
  jaccard = TRUE, bs_num = 100, bs_percentage = 0.9)

Arguments

data

an object of class data.frame

efs_table

a table object of class matrix (retrieved from ensemble_fs)

file_name

a character string, name which is used for the two possible PDF files.

classnumber

a number indicating the index of variable for binary classification

NA_threshold

a number in range of [0,1]. Threshold for deletion of features with a greater proportion of NAs than NA_threshold.

logreg

a logical value indicating whether to conduct an evaluation via logistic regression or not

rf

a logical value indicating whether to conduct an evaluation via random forest or not

permutation

a logical value indicating whether to conduct a permutation of the class variable or not

p_num

number of permutations

variances

a logical value indicating whether to calculate the variances of importances retrieved from bootrapping or not

jaccard

a logical value indicating whether to calculate the jaccard-index or not

bs_num

a number of boostrap permutations of the importances

bs_percentage

a number in range of [0,1]. Proportion of randomly selected samples for boostraping

Details

A logistic regression model with leave-one-out cross-validation (LOOCV) of the selected features and of all feature is conducted by logreg = TRUE. Both AUC-values of the ROC curves are compared with roc.test. The ROC curves are illustrated on the PDF file "file_name" + "LG-ROC.pdf".
By rf = TRUE, random forst model will be constructed and evaluated. Parallel to Logreg, the AUC-values of the two ROC curves of all features and a subset of the best ranked feautres are compared with roc.test. The ROC curves are illustrated on the PDF file "file_name" + "RF-ROC.pdf".

The permutation test (permutation = TRUE) compares the AUC outcome of an logistic regression with p_num AUCs from random permutations of the class variable by a t.test.

Variances of the importances after a bootstrapping analysis are calculated by variances = TRUE. Thereby the number and proportion of the bootstrapping can be set by bs_num and bs_percentage. The function also provides a PDF file "file_name" +"_Variances.pdf". Additionally, the Jaccard-index of this bootstrapped importances can be calculated by setting jaccard=TRUE.

Value

An object of class list, with the following components:
"AUC of LR with all parameters",
"AUC of LR with EFS parameter"
"P-value of LR-ROC test", #'
"AUC of RF with all parameters",
"AUC of RF with EFS parameter"
"P-value of RF-ROC test",
"P-value of permutation",
"Variances of feature importances",
"Jaccard-index".

Author(s)

Ursula Neumann

See Also

glm, roc,prediction, boxplot, tail, t.test

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 ## Loading dataset in environment
 data(efsdata)
 ## Generate a ranking based on importance (with default
 ## NA_threshold = 0.7,cor_threshold = 0.2)
 efs<-ensemble_fs(efsdata,5,runs=2)
 ## Conduct AUC test and permutation test
 eval_example <- efs_eval(data = efsdata, efs_table = efs, file_name = 'eval_test', 
                      classnumber = 5, NA_threshold = 0.2,
                      logreg = TRUE,
                      rf = FALSE,
                      permutation = TRUE, p_num = 2, 
                      variances = FALSE, jaccard = FALSE)
## Calculating variances and the Jaccard-index can take several minutes computation time 

Example output

[1] "default value for NA_threshold = 0.2"
[1] "default value for cor_threshold = 0.7"
[1] "default value for selection is c(TRUE, TRUE, TRUE,TRUE, TRUE, TRUE, FALSE, FALSE)"
[1] "Start Median"
[1] "Start Pearson"
[1] "Start Spearman"
[1] "Start LogReg"
[1] "Start RF"
[1] 1
Time difference of 0.02458167 secs
[1] 2
Time difference of 0.2683663 secs
[1] "Build return matrix"
[1] "Done"
Time difference of 0.3356316 secs
Warning messages:
1: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
2: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
3: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
4: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
5: In wilcox.test.default(x, y) : cannot compute exact p-value with ties
[1] "default value for bs_num = 100"
[1] "default value for bs_percentage = 0.9"

EFS documentation built on May 2, 2019, 9:58 a.m.

Related to efs_eval in EFS...