performance: Performance statistics for prediction
In quanteda/quanteda.classifiers: Models for supervised text classification

performance

R Documentation

Performance statistics for prediction

Description

Functions for computing performance statistics used for model evaluation.

performance() computes all of the following, which are also available via specific functions:

Given a 2 x 2 table with notation

	Truth
Predicted	Positive	Negative
Positive	A	B
Negative	C	D

The metrics computed here are:

precision: A / (A + B)
recall: A / (A + C)
F1: 2 / (recall^{-1} + precision^{-1})
accuracy: (A + D) / (A + B + C + D), or correctly predicted / all
balanced_accuracy: mean(recall) for all categories

Usage

performance(data, truth, by_class = TRUE, ...)

precision(data, truth, by_class = TRUE, ...)

recall(data, truth, by_class = TRUE, ...)

f1_score(data, truth, by_class = TRUE, ...)

accuracy(data, truth, ...)

balanced_accuracy(data, ...)

Arguments

`data`	a table of predicted by truth, or vector of predicted labels
`truth`	vector of "true" labels, or if a table, `2` to indicate that the "true" values are in columns, or `1` if in rows.
`by_class`	logical; if `TRUE`, estimate performance score separately for each class, otherwise average across classes
`...`	not used

Value

named list consisting of the selected measure(s), where each element is a scalar if by_class = FALSE, or a vector named by class if by_class = TRUE.

References

Powers, D. (2007). "Evaluation: From Precision, Recall and F Factor to ROC, Informedness, Markedness and Correlation." Technical Report SIE-07-001, Flinders University.

Examples

## Data in Table 2 of Powers (2007)

lvs <- c("Relevant", "Irrelevant")
tbl_2_1_pred <- factor(rep(lvs, times = c(42, 58)), levels = lvs)
tbl_2_1_truth <- factor(c(rep(lvs, times = c(30, 12)),
                          rep(lvs, times = c(30, 28))),               
                        levels = lvs)
                        
performance(tbl_2_1_pred, tbl_2_1_truth)
performance(tbl_2_1_pred, tbl_2_1_truth, by_class = FALSE)
performance(table(tbl_2_1_pred, tbl_2_1_truth), by_class = TRUE)

precision(tbl_2_1_pred, tbl_2_1_truth)

recall(tbl_2_1_pred, tbl_2_1_truth)

f1_score(tbl_2_1_pred, tbl_2_1_truth)

accuracy(tbl_2_1_pred, tbl_2_1_truth)

balanced_accuracy(tbl_2_1_pred, tbl_2_1_truth)

quanteda/quanteda.classifiers documentation built on Oct. 20, 2023, 6:53 a.m.