AUCRF: Variable Selection with Random Forest and the Area Under the...

Description Usage Arguments Details Value References See Also Examples

View source: R/AUCRF.R

Description

AUCRF is an algorithm for variable selection using Random Forest based on optimizing the area-under-the ROC curve (AUC) of the Random Forest. The proposed strategy implements a backward elimination process based on the initial ranking of the variables.

Usage

1
  AUCRF(formula, data, k0 = 1, pdel = 0.2, ranking=c("MDG","MDA"), ...)

Arguments

formula

an object of class formula: a symbolic description of the model to be fitted. The details of model specification are given in Details.

data

a data frame containing the variables in the model. Dependent variable must be a binary variable defined as factor and codified as 1 for positives (e.g. cases) and 0 for negatives (e.g. controls).

k0

number of remaining variables for stopping the backward elimination process. By default k0=1.

pdel

fraction of remaining variables to be removed in each step. By default pdel=0.2. If pdel=0, only one variable is removed each time.

ranking

specifies the importance measure provided by randomForest for ranking the variables. There are two options MDG (by default) for MeanDecreaseGini and MDA for MeanDecreaseAccuracy.

...

optional parameters to be passed to the randomForest function. If no arguments are specified, default arguments of randomForest function will be used.

Details

The AUC-RF algorithm is described in detail in Calle et. al.(2011). The following is a summary:

Ranking and AUC of the initial set:
Perform a random forest using all predictor variables and the response, as specified in the formula argument, and compute the AUC of the random forest. Based on the selected measure of importance (by default MDG), obtain a ranking of predictors.

Elimination process:
Based on the variables ranking, remove the less important variables (fraction of variables specified in pdel argument). Perform a new random forest with the remaining variables and compute its AUC. This step is iterated until the number of remaining variables is less or equal than k0.

Optimal set:
The optimal set of predictive variables is considered the one giving rise to the Random Forest with the highest OOB-AUCopt. The number of selected predictors is denoted by Kopt

Value

An object of class AUCRF, which is a list with the following components:

call

the original call to AUCRF.

data

the data argument.

ranking

the ranking of predictors based on the importance measure.

Xopt

optimal set of predictors obtained.

OOB-AUCopt

AUC obtained for the optimal set of predictors.

Kopt

size of the optimal set of predictors obtained.

AUCcurve

values of AUC obtained for each set of predictors evaluated in the elimination process.

RFopt

the randomForest adjusted with the optimal set.

References

Calle ML, Urrea V, Boulesteix A-L, Malats N (2011) "AUC-RF: A new strategy for genomic profiling with Random Forest". Human Heredity. (In press)

See Also

OptimalSet, AUCRFcv, randomForest.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
   
  # load the included example dataset. This is a simulated case/control study  
  # data set with 4000 patients (2000 cases / 2000 controls) and 1000 SNPs, 
  # where the  first 10 SNPs have a direct association with the outcome:
  data(exampleData)
  
  # call AUCRF process: (it may take some time)
  # fit <- AUCRF(Y~., data=exampleData)
  
  # The result of this example is included for illustration purpose:
  
  data(fit)
  summary(fit)
  plot(fit)
  
  # Additional randomForest parameters can be included, otherwise default
  # parameters of randomForest function will be used:
  # fit <- AUCRF(Y~., data=exampleData, ntree=1000, nodesize=20)

Example output

Loading required package: randomForest
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.
AUCRF 1.1

Number of selected variables: Kopt= 32 
AUC of selected variables: OOB-AUCopt= 0.7787711 
Importance Measure: MDG 

   Selected.Variables Importance
1                SNP9  15.047305
2                SNP4  12.912120
3                SNP3  10.486599
4                SNP7   9.767075
5                SNP8   9.283819
6                SNP2   9.043039
7                SNP6   8.743129
8               SNP10   8.465736
9                SNP5   7.844703
10               SNP1   7.533021
11             SNP369   2.677609
12             SNP584   2.565316
13             SNP747   2.504847
14              SNP47   2.469360
15              SNP55   2.469196
16             SNP674   2.445041
17             SNP354   2.441501
18             SNP993   2.424503
19             SNP661   2.423057
20              SNP73   2.399690
21             SNP690   2.398267
22              SNP14   2.390978
23             SNP878   2.387848
24             SNP651   2.353301
25             SNP191   2.349521
26             SNP684   2.346010
27             SNP278   2.341461
28             SNP771   2.336632
29             SNP575   2.318485
30             SNP544   2.307716
31             SNP726   2.299561
32             SNP336   2.279044

AUCRF documentation built on May 29, 2017, 9:29 p.m.

Related to AUCRF in AUCRF...