evaluate_emotions: Evaluate Emotion Classification Performance

View source: R/evaluate_emotions.R

evaluate_emotionsR Documentation

Evaluate Emotion Classification Performance

Description

Comprehensive evaluation function for discrete emotion classification tasks. Computes standard classification metrics including accuracy, F1-scores, AUROC, calibration metrics, and inter-rater reliability measures.

Usage

evaluate_emotions(
  data,
  id_col = "id",
  truth_col = "truth",
  pred_col = "pred",
  probs_cols = NULL,
  classes = NULL,
  metrics = c("accuracy", "precision", "recall", "f1_macro", "f1_micro", "auroc", "ece",
    "krippendorff", "confusion_matrix"),
  return_plot = FALSE,
  na_rm = TRUE
)

Arguments

data

A data frame or file path to CSV containing evaluation data. Must include columns for identifiers, ground truth, predictions, and optionally class probabilities.

id_col

Character. Name of column containing unique identifiers (default: "id").

truth_col

Character. Name of column containing ground truth labels (default: "truth").

pred_col

Character. Name of column containing predicted labels (default: "pred").

probs_cols

Character vector. Names of columns containing class probabilities. If NULL, probabilistic metrics will be skipped.

classes

Character vector. Emotion classes to evaluate. If NULL, will be inferred from the data.

metrics

Character vector. Metrics to compute. Options include: "accuracy", "precision", "recall", "f1_macro", "f1_micro", "auroc", "ece", "krippendorff", "confusion_matrix" (default: all metrics).

return_plot

Logical. Whether to return plotting helpers (default: FALSE).

na_rm

Logical. Whether to remove missing values (default: TRUE).

Details

This function implements a comprehensive evaluation pipeline for discrete emotion classification following best practices from the literature.

**Metrics computed:**

  • **Accuracy**: Overall classification accuracy

  • **Precision/Recall/F1**: Per-class and macro/micro averages

  • **AUROC**: Area under ROC curve (requires probability scores)

  • **ECE**: Expected Calibration Error for probability calibration

  • **Krippendorff's alpha**: Inter-rater reliability between human and model

**Input format:** The input data should contain at minimum:

  • ID column: Unique identifier for each instance

  • Truth column: Ground truth emotion labels

  • Prediction column: Model predicted emotion labels

  • Probability columns (optional): Class probabilities for each emotion

Value

A list containing:

  • metrics: Data frame with computed evaluation metrics

  • confusion_matrix: Confusion matrix (if requested)

  • per_class: Per-class metrics breakdown

  • summary: Overall performance summary

  • plot_data: Data prepared for plotting (if return_plot = TRUE)

References

Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756.

Krippendorff, K. (2011). Computing Krippendorff's alpha-reliability. Scholarly commons, 25.

Naeini, M. P., Cooper, G., & Hauskrecht, M. (2015). Obtaining well calibrated probabilities using bayesian binning. In AAAI (pp. 2901-2907).

See Also

transformer_scores, nlp_scores, emoxicon_scores for emotion prediction functions.

Examples

## Not run: 
# Basic evaluation with predicted labels only
results <- evaluate_emotions(
  data = evaluation_data,
  truth_col = "human_label",
  pred_col = "model_prediction"
)

# Full evaluation with probabilities
results <- evaluate_emotions(
  data = evaluation_data,
  truth_col = "ground_truth",
  pred_col = "predicted_class",
  probs_cols = c("prob_anger", "prob_joy", "prob_sadness"),
  return_plot = TRUE
)

# Custom metrics selection
results <- evaluate_emotions(
  data = evaluation_data,
  metrics = c("accuracy", "f1_macro", "confusion_matrix")
)

## End(Not run)


transforEmotion documentation built on Jan. 8, 2026, 5:06 p.m.