Metrics to evaluate a classifier accuracy in imbalanced learning

Share:

Description

This function computes precision, recall and the F measure of a prediction.

Usage

1
accuracy.meas(response, predicted, threshold = 0.5) 

Arguments

response

A vector of responses containing two classes to be used to evaluate prediction accuracy. It can be of class "factor", "numeric" or "character".

predicted

A vector containing a prediction for each observation. This can be of class "factor" or "character" if the predicted label classes are provided or "numeric" for the probabilities of the rare class (or a monotonic function of them).

threshold

When predicted is of class numeric, it defines the probability threshold to classify an example as positive. Default value is meant for predicted probabilities and is set to 0.5. See further details below. Ignored if predicted is of class factor

Details

Prediction of positive or negative labels depends on the classification threshold, here defined as the value such that observations with predicted value greater than the threshold are assigned to the positive class. Some caution is due in setting the threshold as well as in using the default setting both because the default value is meant for predicted probabilities and because the default 0.5 is not necessarily the optimal choice for imbalanced learning. Smaller values set for the threshold correspond to assign a larger misclassification costs to the rare class, which is usually the case.

Precision is defined as follows:

\frac{\mbox{true positives}}{\mbox{true positives + false positives}}

Recall is defined as:

\frac{\mbox{true positives}}{\mbox{true positives + false negative}}

The F measure is the harmonic average between precision and recall:

2 \cdot \frac{\mbox{precision} \cdot \mbox{recall}}{\mbox{precision+recall}}

Value

The value is an object of class accuracy.meas which has components

Call

The matched call.

threshold

The selected threshold.

precision

A vector of length one giving the precision of the prediction

recall

A vector of length one giving the recall of the prediction

F

A vector of length one giving the F measure

References

Fawcet T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27 (8), 861–875.

See Also

roc.curve

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 2-dimensional example
# loading data
data(hacide)

# imbalance on training set
table(hacide.train$cls)

# model estimation using logistic regression
fit.hacide  <- glm(cls~., data=hacide.train, family="binomial")

# prediction on training set
pred.hacide.train <- predict(fit.hacide, newdata=hacide.train,
                             type="response")

# compute accuracy measures (training set)
accuracy.meas(hacide.train$cls, pred.hacide.train, threshold = 0.02)

# imbalance on test set 
table(hacide.test$cls)

# prediction on test set
pred.hacide.test <- predict(fit.hacide, newdata=hacide.test,
                            type="response")

# compute accuracy measures (test set)
accuracy.meas(hacide.test$cls, pred.hacide.test, threshold = 0.02)