Description Usage Arguments Details Value Author(s) References Examples
Function for evaluating a OneR classification model. Prints confusion matrices with prediction vs. actual in absolute and relative numbers. Additionally it gives the accuracy, error rate as well as the error rate reduction versus the base rate accuracy together with a p-value.
1 2 | eval_model(prediction, actual, dimnames = c("Prediction", "Actual"),
zero.print = "0")
|
prediction |
vector which contains the predicted values. |
actual |
data frame which contains the actual data. When there is more than one column the last last column is taken. A single vector is allowed too. |
dimnames |
character vector of printed dimnames for the confusion matrices. |
zero.print |
character specifying how zeros should be printed; for sparse confusion matrices, using "." can produce more readable results. |
Error rate reduction versus the base rate accuracy is calculated by the following formula:
(Accuracy(Prediction) - Accuracy(Baserate)) / (1 - Accuracy(Baserate)),
giving a number between 0 (no error reduction) and 1 (no error).
In some borderline cases when the model is performing worse than the base rate negative numbers can result. This shows that something is seriously wrong with the model generating this prediction.
The provided p-value gives the probability of obtaining a distribution of predictions like this (or even more unambiguous) under the assumption that the real accuracy is equal to or lower than the base rate accuracy.
More technicaly it is derived from a one-sided binomial test with the alternative hypothesis that the prediction's accuracy is bigger than the base rate accuracy.
Loosly speaking a low p-value (< 0.05) signifies that the model really is able to give predictions that are better than the base rate.
Invisibly returns a list with the number of correctly classified and total instances and a confusion matrix with the absolute numbers.
Holger von Jouanne-Diedrich
1 2 3 4 5 |
Call:
OneR.data.frame(x = data)
Rules:
If Petal.Width = (0.0976,0.58] then Species = setosa
If Petal.Width = (0.58,1.06] then Species = versicolor
If Petal.Width = (1.06,1.54] then Species = versicolor
If Petal.Width = (1.54,2.02] then Species = virginica
If Petal.Width = (2.02,2.5] then Species = virginica
Accuracy:
141 of 150 instances classified correctly (94%)
Contingency table:
Petal.Width
Species (0.0976,0.58] (0.58,1.06] (1.06,1.54] (1.54,2.02] (2.02,2.5] Sum
setosa * 49 1 0 0 0 50
versicolor 0 * 7 * 38 5 0 50
virginica 0 0 3 * 24 * 23 50
Sum 49 8 41 29 23 150
---
Maximum in each column: '*'
Pearson's Chi-squared test:
X-squared = 253.24, df = 8, p-value < 2.2e-16
Confusion matrix (absolute):
Actual
Prediction setosa versicolor virginica Sum
setosa 49 0 0 49
versicolor 1 45 3 49
virginica 0 5 47 52
Sum 50 50 50 150
Confusion matrix (relative):
Actual
Prediction setosa versicolor virginica Sum
setosa 0.33 0.00 0.00 0.33
versicolor 0.01 0.30 0.02 0.33
virginica 0.00 0.03 0.31 0.35
Sum 0.33 0.33 0.33 1.00
Accuracy:
0.94 (141/150)
Error rate:
0.06 (9/150)
Error rate reduction (vs. base rate):
0.91 (p-value < 2.2e-16)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.