getBestRFModel: Extracting the best performing Random Forest model

Description Usage Arguments Details Value Author(s) Examples

Description

This function allows to find the best performing Random Forest model starting from a k-combination of its input variables

Usage

1
getBestRFModel(combinations, data, params)

Arguments

combinations

a k x n matrix in which n is the number of combinations of the input variables and k is the size of each combination

data

a n x p data frame of n observations and p-2 predictors. The first two columns must represent the sample names and the classes associates to each sample

params

a list of params useful to perform a Monte Carlo Cross validation. It should contain the following data:

  • ntrees the number of trees of each random forest model

  • nsplits the number of random splittings of the original dataset into training and test data sets

  • test_prop the percentage (expressed as a real number) of the observations of the original dataset to be included in each test set

  • ref_level the assumed reference class label

Details

The k-combinations of the input variables is represented as a k x n matrix in which k is the size of each combination and n is the number of combinations of the input variables of the original dataset. Each column of the combinations matrix contains the indexes of the input variables from the original dataset The getBestRFModel extracts a datAset from the original one considering the indexes in these columns. Then it will build a Random Forest model performing a Monte Carlo CV for each dataset. The models cross-validated will be compared considering the AUC of their averaged ROC curve. The function will return the best models, the maximum value of AUC and the most relevant input variables associated

Value

a list of the following elements:

Author(s)

Piergiorgio Palla

Examples

1
2
3
4
5
6
## data(cachexiaData)
## dataset <- cachexiaData[, 1:15]
## indexes <- 3:15
## combinations <- combn(x = indexes, m = 5) # a 5 x n_of_combinations matrix
## test_params = list(ntrees= 500, nsplits = 100, test_prop = 1/3)
## res <- getBestRFModel(combinations, dataset, test_params)

RFmarkerDetector documentation built on May 2, 2019, 3:42 p.m.