confusion: Construct and analyze confusion matrices

View source: R/confusion.R

confusionR Documentation

Construct and analyze confusion matrices

Description

Confusion matrices compare two classifications (usually one done automatically using a machine learning algorithm versus the true classification done by a specialist... but one can also compare two automatic or two manual classifications against each other).

Usage

confusion(x, ...)

## Default S3 method:
confusion(
  x,
  y = NULL,
  vars = c("Actual", "Predicted"),
  labels = vars,
  merge.by = "Id",
  useNA = "ifany",
  prior,
  ...
)

## S3 method for class 'mlearning'
confusion(
  x,
  y = response(x),
  labels = c("Actual", "Predicted"),
  useNA = "ifany",
  prior,
  ...
)

## S3 method for class 'confusion'
print(x, sums = TRUE, error.col = sums, digits = 0, sort = "ward.D2", ...)

## S3 method for class 'confusion'
summary(object, type = "all", sort.by = "Fscore", decreasing = TRUE, ...)

## S3 method for class 'summary.confusion'
print(x, ...)

Arguments

x

an object with a confusion() method implemented.

...

further arguments passed to the method.

y

another object, from which to extract the second classification, or NULL if not used.

vars

the variables of interest in the first and second classification in the case the objects are lists or data frames. Otherwise, this argument is ignored and x and y must be factors with same length and same levels.

labels

labels to use for the two classifications. By default, they are the same as vars, or the one in the confusion matrix.

merge.by

a character string with the name of variables to use to merge the two data frames, or NULL.

useNA

do we keep NAs as a separate category? The default "ifany" creates this category only if there are missing values. Other possibilities are "no", or "always".

prior

class frequencies to use for first classifier that is tabulated in the rows of the confusion matrix. For its value, see here under, the ⁠value=⁠ argument.

sums

is the confusion matrix printed with rows and columns sums?

error.col

is a column with class error for first classifier added (equivalent to false negative rate of FNR)?

digits

the number of digits after the decimal point to print in the confusion matrix. The default or zero leads to most compact presentation and is suitable for frequencies, but not for relative frequencies.

sort

are rows and columns of the confusion matrix sorted so that classes with larger confusion are closer together? Sorting is done using a hierarchical clustering with hclust(). The clustering method is "ward.D2" by default, but see the hclust() help for other options). If FALSE or NULL, no sorting is done.

object

a confusion object

type

either "all" (by default), or considering TP is the true positives, FP is the false positives, TN is the true negatives and FN is the false negatives, one can also specify: "Fscore" (F-score = F-measure = F1 score = harmonic mean of Precision and recall), "Recall" (TP / (TP + FN) = 1 - FNR), "Precision" (TP / (TP + FP) = 1 - FDR), "Specificity" (TN / (TN + FP) = 1 - FPR), "NPV" (Negative predicted value = TN / (TN + FN) = 1 - FOR), "FPR" (False positive rate = 1 - Specificity = FP / (FP + TN)), "FNR" (False negative rate = 1 - Recall = FN / (TP + FN)), "FDR" (False Discovery Rate = 1 - Precision = FP / (TP + FP)), "FOR" (False omission rate = 1 - NPV = FN / (FN + TN)), "LRPT" (Likelihood Ratio for Positive Tests = Recall / FPR = Recall / (1 - Specificity)), "LRNT" Likelihood Ratio for Negative Tests = FNR / Specificity = (1 - Recall) / Specificity, "LRPS" (Likelihood Ratio for Positive Subjects = Precision / FOR = Precision / (1 - NPV)), "LRNS" (Likelihood Ratio Negative Subjects = FDR / NPV = (1 - Precision) / (1 - FOR)), "BalAcc" (Balanced accuracy = (Sensitivity + Specificity) / 2), "MCC" (Matthews correlation coefficient), "Chisq" (Chisq metric), or "Bray" (Bray-Curtis metric)

sort.by

the statistics to use to sort the table (by default, Fmeasure, the F1 score for each class = 2 * recall * precision / (recall + precision)).

decreasing

do we sort in increasing or decreasing order?

Value

A confusion matrix in a confusion object.

See Also

mlearning(), plot.confusion(), prior()

Examples

data("Glass", package = "mlbench")
# Use a little bit more informative labels for Type
Glass$Type <- as.factor(paste("Glass", Glass$Type))

# Use learning vector quantization to classify the glass types
# (using default parameters)
summary(glass_lvq <- ml_lvq(Type ~ ., data = Glass))

# Calculate cross-validated confusion matrix
(glass_conf <- confusion(cvpredict(glass_lvq), Glass$Type))
# Raw confusion matrix: no sort and no margins
print(glass_conf, sums = FALSE, sort = FALSE)

summary(glass_conf)
summary(glass_conf, type = "Fscore")

mlearning documentation built on Aug. 31, 2023, 1:09 a.m.