View source: R/genericFunctions.R
roc.curve | R Documentation |
plot ROC and precision-recall curves for objects of class randomUniformForest and compute F-beta score. It also works for any other model that provides predicted labels (but only for ROC curve).
roc.curve(X, Y, classes, positive = classes[2], ranking.threshold = 0, ranking.values = 0, falseDiscoveryRate = FALSE, plotting = TRUE, printValues = TRUE, Beta = 1)
X |
a vector (or a factor) of predictions (with two classes) or an object of class randomUniformForest (with OOB option enabled). |
Y |
a vector of numeric (integer) responses, or a factor, with two classes. |
classes |
a vector (or a factor) of values that designate the class. |
positive |
convention for the positive class (e.g. the minority class). |
ranking.threshold |
option currently implemented but not fully tested. |
ranking.values |
option currently implemented but not fully tested. |
falseDiscoveryRate |
if TRUE, precision-recall curve is plotted. if FALSE, default value, ROC curve is plotted. |
plotting |
plotting the ROC curve ? |
printValues |
display values to screen ? |
Beta |
'beta' value for F-beta score. |
Saip Ciss saip.ciss@wanadoo.fr
importance.randomUniformForest
## Classification : "breast cancer" data # http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) data(breastCancer) breastCancer.data <- breastCancer # remove ID (first column) and divide data in train and test set breastCancer.data = breastCancer.data[,-1] n <- nrow(breastCancer.data) p <- ncol(breastCancer.data) trainTestIdx <- cut(sample(1:n, n), 2, labels= FALSE) # train examples breastCancer.data.train <- breastCancer.data[trainTestIdx == 1, -p] breastCancer.class.train <- as.factor(breastCancer.data[trainTestIdx == 1, p]) # rename class in benign (class 2) and malignant (class 4) to have a better view levels(breastCancer.class.train) = c("benign", "malignant") # test data breastCancer.data.test <- breastCancer.data[trainTestIdx == 2, -p] breastCancer.class.test <- as.factor(breastCancer.data[trainTestIdx == 2, p]) levels(breastCancer.class.test) = c("benign", "malignant") # compute model : train then test in the same function and assign class weights # to better match the distribution (OOB errors and Breiman's bounds can help to choose weights) # Note that in this case 'recall' (or sensitivity) is the objective, # e.g. match all possible cases of malignant tumours even if false positive rate increase #(in this latter case, further steps will reveal the truth). If malignant tumour is not detected, # then diagnosis error is, by far, more critical. breastCancer.ruf <- randomUniformForest(breastCancer.data.train, breastCancer.class.train, xtest = breastCancer.data.test, ytest = breastCancer.class.test, classwt = c(1, 3.5), threads = 2, ntree = 40, BreimanBounds = FALSE) # get a summary of model breastCancer.ruf ## plot ROC Curve for test data # roc.curve(breastCancer.ruf, breastCancer.class.test, levels(breastCancer.class.test)) ## plot precision-recall curve for test data # roc.curve(breastCancer.ruf, breastCancer.class.test, levels(breastCancer.class.test), # falseDiscoveryRate = TRUE) ## associate cut-off and purely random forest as an alternative to find maximum malignant cases ## with a low false positive rate. ## 'classcutoff' option is a bit tricky. Let's take the example we will use below. ## classcutoff = c("benign", 1.25) means that number of votes for class "benign" ## will be weighted by 0.4 (= Cte/1.25, where Cte = 0.5) for each response. ## Hence "benign" will never have majority unless it has 2.5 (1.25/0.5) times more votes ## than "malignant" class and all votes sum to total number of trees. # breastCancer.cutOff.ruf <- randomUniformForest(breastCancer.data.train, breastCancer.class.train, # xtest = breastCancer.data.test, ytest = breastCancer.class.test, classcutoff = c("benign", 1.25), # randomfeature = TRUE, ntree = 50, threads = 2, BreimanBounds = FALSE) # roc.curve(breastCancer.cutOff.ruf, breastCancer.class.test, levels(breastCancer.class.test)) # roc.curve(breastCancer.cutOff.ruf, breastCancer.class.test, levels(breastCancer.class.test), # falseDiscoveryRate = TRUE) ## evaluate OOB data, when there is no test set # breastCancer.ruf <- randomUniformForest(breastCancer.data.train, breastCancer.class.train, # classwt = c(1, 3.5), threads = 2) # roc.curve(breastCancer.ruf, breastCancer.class.train, levels(breastCancer.class.train))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.