View source: R/06_performance_report.R
| performance_report | R Documentation |
Produce a detailed report on the discrepancies between LLM extracted data and human annotated data for the same collection of files.
performance_report(
human_data,
model_data,
full_locations = "coordinates",
string_distance = "levenshtein",
verbose = TRUE,
rmds = TRUE,
path = NULL
)
human_data |
matrix. Ground truth dataset to compare the data extracted by a LLM. |
model_data |
matrix. Dataset of location data, following the description under |
full_locations |
character. Defines dataset structure.
If |
string_distance |
character. Selects the method through which the proximity
between two strings is calculated, from those available under |
verbose |
logical. Determines if output should be printed. |
rmds |
logical. Determines if more extensive R Markdown files should be created at |
path |
character. Directory to which the output of the function is saved. |
Four main metrics are calculated to report on the performance of the model for coordinates. These are
Accuracy, \frac{TP}{TP + FP + FN}, here defined as such in a system without True Negatives.
Recall, \frac{TP}{TP + FN}, Kent et al. (1955)
Precision, \frac{TP}{TP + FP}, Kent et al. (1955)
F1 score, \frac{2}{\frac{1}{Precision} + \frac{1}{Sensitivity}}, van Rijsbergen(1979).
Additional metrics are calculated, including: 1) a distance-weighed confusion matrix where the sum of each type of error (False Negatives and False Positives) is done by weights, calculated to be inverse to the mean euclidean distance of that data point to all others. This way errors that are close to existing data for that species will count less than those further way, i.e. a data point was hallucinated that was close to existing data or, a data point was missed that is already represented in the data. This adjusted confusion matrix is also presented along with versions of the four main metrics calculated with these values. To report on the performance of locations, by default the minimum Levenshtein distance (Levenshtein, 1966) between a term and all other terms is calculated. Which is defined as:
lev(a,b) = \begin{cases}
|a| & if |b|=0, \\
|b| & if |a|=0, \\
lev(tail(a),tail(b)) & if head(a) = head(b), \\
1 + min
\begin{cases}
lev (tail(a),b) \\
lev (a,tail(b)) \\
lev (tail(a),tail(b)) \\
\end{cases}
& otherwise
\end{cases}
In short, the number of edits needed to turn one string a into string b.
list. A confusion matrix is returned for every species per document, plus one for the entire process.
Kent, A. et al. (1955). "Machine literature searching VIII. Operational criteria for designing information retrieval systems", American Documentation, 6(2), pp. 93–101. doi:10.1002/asi.5090060209.
van Rijsbergen, C.J. (1979). "Information Retrieval", Architectural Press. ISBN: 978-0408709293.
Levenshtein, V.I. (1966). "Binary codes capable of correcting deletions, insertions, and reversals", Soviet Physics-Doklady, 10(8), pp. 707–710 [Translated from Russian].
trial_data = arete::arete_data("holzapfelae-extract")
trial_data = cbind(trial_data[,1:2], arete::string_to_coords(trial_data[,3])[2:1], trial_data[,4:5])
trial_data = list(
GT = trial_data[trial_data$Type == "Ground truth", 1:5],
MD = trial_data[trial_data$Type == "Model", 1:5]
)
# make sure you run arete_setup() beforehand!
performance_report(
trial_data$GT,
trial_data$MD,
full_locations = "both",
verbose = FALSE,
rmds = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.