eval_model: Classification Evaluation function

Description Usage Arguments Details Value Author(s) References Examples

Description

Function for evaluating a OneR classification model. Prints confusion matrices with prediction vs. actual in absolute and relative numbers. Additionally it gives the accuracy, error rate as well as the error rate reduction versus the base rate accuracy together with a p-value.

Usage

1
2
eval_model(prediction, actual, dimnames = c("Prediction", "Actual"),
  zero.print = "0")

Arguments

prediction

vector which contains the predicted values.

actual

data frame which contains the actual data. When there is more than one column the last last column is taken. A single vector is allowed too.

dimnames

character vector of printed dimnames for the confusion matrices.

zero.print

character specifying how zeros should be printed; for sparse confusion matrices, using "." can produce more readable results.

Details

Error rate reduction versus the base rate accuracy is calculated by the following formula:

(Accuracy(Prediction) - Accuracy(Baserate)) / (1 - Accuracy(Baserate)),

giving a number between 0 (no error reduction) and 1 (no error).

In some borderline cases when the model is performing worse than the base rate negative numbers can result. This shows that something is seriously wrong with the model generating this prediction.

The provided p-value gives the probability of obtaining a distribution of predictions like this (or even more unambiguous) under the assumption that the real accuracy is equal to or lower than the base rate accuracy. More technicaly it is derived from a one-sided binomial test with the alternative hypothesis that the prediction's accuracy is bigger than the base rate accuracy. Loosly speaking a low p-value (< 0.05) signifies that the model really is able to give predictions that are better than the base rate.

Value

Invisibly returns a list with the number of correctly classified and total instances and a confusion matrix with the absolute numbers.

Author(s)

Holger von Jouanne-Diedrich

References

https://github.com/vonjd/OneR

Examples

1
2
3
4
5
data <- iris
model <- OneR(data)
summary(model)
prediction <- predict(model, data)
eval_model(prediction, data)

Example output

Call:
OneR.data.frame(x = data)

Rules:
If Petal.Width = (0.0976,0.58] then Species = setosa
If Petal.Width = (0.58,1.06]   then Species = versicolor
If Petal.Width = (1.06,1.54]   then Species = versicolor
If Petal.Width = (1.54,2.02]   then Species = virginica
If Petal.Width = (2.02,2.5]    then Species = virginica

Accuracy:
141 of 150 instances classified correctly (94%)

Contingency table:
            Petal.Width
Species      (0.0976,0.58] (0.58,1.06] (1.06,1.54] (1.54,2.02] (2.02,2.5] Sum
  setosa              * 49           1           0           0          0  50
  versicolor             0         * 7        * 38           5          0  50
  virginica              0           0           3        * 24       * 23  50
  Sum                   49           8          41          29         23 150
---
Maximum in each column: '*'

Pearson's Chi-squared test:
X-squared = 253.24, df = 8, p-value < 2.2e-16


Confusion matrix (absolute):
            Actual
Prediction   setosa versicolor virginica Sum
  setosa         49          0         0  49
  versicolor      1         45         3  49
  virginica       0          5        47  52
  Sum            50         50        50 150

Confusion matrix (relative):
            Actual
Prediction   setosa versicolor virginica  Sum
  setosa       0.33       0.00      0.00 0.33
  versicolor   0.01       0.30      0.02 0.33
  virginica    0.00       0.03      0.31 0.35
  Sum          0.33       0.33      0.33 1.00

Accuracy:
0.94 (141/150)

Error rate:
0.06 (9/150)

Error rate reduction (vs. base rate):
0.91 (p-value < 2.2e-16)

OneR documentation built on May 2, 2019, 9:33 a.m.