View source: R/evaluate_emotions.R
| evaluate_emotions | R Documentation |
Comprehensive evaluation function for discrete emotion classification tasks. Computes standard classification metrics including accuracy, F1-scores, AUROC, calibration metrics, and inter-rater reliability measures.
evaluate_emotions(
data,
id_col = "id",
truth_col = "truth",
pred_col = "pred",
probs_cols = NULL,
classes = NULL,
metrics = c("accuracy", "precision", "recall", "f1_macro", "f1_micro", "auroc", "ece",
"krippendorff", "confusion_matrix"),
return_plot = FALSE,
na_rm = TRUE
)
data |
A data frame or file path to CSV containing evaluation data. Must include columns for identifiers, ground truth, predictions, and optionally class probabilities. |
id_col |
Character. Name of column containing unique identifiers (default: "id"). |
truth_col |
Character. Name of column containing ground truth labels (default: "truth"). |
pred_col |
Character. Name of column containing predicted labels (default: "pred"). |
probs_cols |
Character vector. Names of columns containing class probabilities. If NULL, probabilistic metrics will be skipped. |
classes |
Character vector. Emotion classes to evaluate. If NULL, will be inferred from the data. |
metrics |
Character vector. Metrics to compute. Options include: "accuracy", "precision", "recall", "f1_macro", "f1_micro", "auroc", "ece", "krippendorff", "confusion_matrix" (default: all metrics). |
return_plot |
Logical. Whether to return plotting helpers (default: FALSE). |
na_rm |
Logical. Whether to remove missing values (default: TRUE). |
This function implements a comprehensive evaluation pipeline for discrete emotion classification following best practices from the literature.
**Metrics computed:**
**Accuracy**: Overall classification accuracy
**Precision/Recall/F1**: Per-class and macro/micro averages
**AUROC**: Area under ROC curve (requires probability scores)
**ECE**: Expected Calibration Error for probability calibration
**Krippendorff's alpha**: Inter-rater reliability between human and model
**Input format:** The input data should contain at minimum:
ID column: Unique identifier for each instance
Truth column: Ground truth emotion labels
Prediction column: Model predicted emotion labels
Probability columns (optional): Class probabilities for each emotion
A list containing:
metrics: Data frame with computed evaluation metrics
confusion_matrix: Confusion matrix (if requested)
per_class: Per-class metrics breakdown
summary: Overall performance summary
plot_data: Data prepared for plotting (if return_plot = TRUE)
Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756.
Krippendorff, K. (2011). Computing Krippendorff's alpha-reliability. Scholarly commons, 25.
Naeini, M. P., Cooper, G., & Hauskrecht, M. (2015). Obtaining well calibrated probabilities using bayesian binning. In AAAI (pp. 2901-2907).
transformer_scores, nlp_scores,
emoxicon_scores for emotion prediction functions.
## Not run:
# Basic evaluation with predicted labels only
results <- evaluate_emotions(
data = evaluation_data,
truth_col = "human_label",
pred_col = "model_prediction"
)
# Full evaluation with probabilities
results <- evaluate_emotions(
data = evaluation_data,
truth_col = "ground_truth",
pred_col = "predicted_class",
probs_cols = c("prob_anger", "prob_joy", "prob_sadness"),
return_plot = TRUE
)
# Custom metrics selection
results <- evaluate_emotions(
data = evaluation_data,
metrics = c("accuracy", "f1_macro", "confusion_matrix")
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.