# Variable Selection with Random Forest and the Area Under the Curve

### Description

AUCRF is an algorithm for variable selection using Random Forest based on optimizing the area-under-the ROC curve (AUC) of the Random Forest. The proposed strategy implements a backward elimination process based on the initial ranking of the variables.

### Usage

1 |

### Arguments

`formula` |
an object of class |

`data` |
a data frame containing the variables in the model. Dependent variable must be a
binary variable defined as |

`k0` |
number of remaining variables for stopping the backward elimination process.
By default |

`pdel` |
fraction of remaining variables to be removed in each step. By default |

`ranking` |
specifies the importance measure provided by |

`...` |
optional parameters to be passed to the |

### Details

The AUC-RF algorithm is described in detail in Calle et. al.(2011). The following is a summary:

Ranking and AUC of the initial set:

Perform a random forest using all predictor variables and the response, as specified in the `formula`

argument, and compute the AUC of the random forest. Based on the selected measure of importance (by default MDG),
obtain a ranking of predictors.

Elimination process:

Based on the variables ranking, remove the less important variables (fraction of variables specified in
`pdel`

argument). Perform a new random forest with the remaining variables and compute its AUC.
This step is iterated until the number of remaining variables is less or equal than `k0`

.

Optimal set:

The optimal set of predictive variables is considered the one giving rise to the Random Forest with the
highest OOB-AUC*opt*. The number of selected predictors is denoted by K*opt*

### Value

An object of class `AUCRF`

, which is a list with the following components:

`call` |
the original call to |

`data` |
the |

`ranking` |
the ranking of predictors based on the importance measure. |

`Xopt` |
optimal set of predictors obtained. |

`OOB-AUCopt` |
AUC obtained for the optimal set of predictors. |

`Kopt` |
size of the optimal set of predictors obtained. |

`AUCcurve` |
values of AUC obtained for each set of predictors evaluated in the elimination process. |

`RFopt` |
the |

### References

Calle ML, Urrea V, Boulesteix A-L, Malats N (2011) "AUC-RF: A new strategy for genomic profiling with Random Forest". Human Heredity. (In press)

### See Also

`OptimalSet`

, `AUCRFcv`

, `randomForest`

.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ```
# load the included example dataset. This is a simulated case/control study
# data set with 4000 patients (2000 cases / 2000 controls) and 1000 SNPs,
# where the first 10 SNPs have a direct association with the outcome:
data(exampleData)
# call AUCRF process: (it may take some time)
# fit <- AUCRF(Y~., data=exampleData)
# The result of this example is included for illustration purpose:
data(fit)
summary(fit)
plot(fit)
# Additional randomForest parameters can be included, otherwise default
# parameters of randomForest function will be used:
# fit <- AUCRF(Y~., data=exampleData, ntree=1000, nodesize=20)
``` |