evaluate_model_dia: Evaluate Diagnostic Model Performance
In E2E: Ensemble Learning Framework for Diagnostic and Prognostic Modeling

evaluate_model_dia

R Documentation

Evaluate Diagnostic Model Performance

Description

Evaluates the performance of a trained diagnostic model using various metrics relevant to binary classification, including AUROC, AUPRC, and metrics at an optimal or specified probability threshold.

Usage

evaluate_model_dia(
  model_obj = NULL,
  X_data = NULL,
  y_data,
  sample_ids,
  threshold_strategy = c("default", "f1", "youden", "numeric"),
  specific_threshold_value = 0.5,
  pos_class,
  neg_class,
  precomputed_prob = NULL,
  y_original_numeric = NULL
)

Arguments

`model_obj`	A trained model object (typically a `caret::train` object or a list from an ensemble like Bagging). Can be `NULL` if `precomputed_prob` is provided.
`X_data`	A data frame of features corresponding to the data used for evaluation. Required if `model_obj` is provided and `precomputed_prob` is `NULL`.
`y_data`	A factor vector of true class labels for the evaluation data.
`sample_ids`	A vector of sample IDs for the evaluation data.
`threshold_strategy`	A character string, defining how to determine the threshold for class-specific metrics: "default" (0.5), "f1" (maximizes F1-score), "youden" (maximizes Youden's J statistic), or "numeric" (uses `specific_threshold_value`).
`specific_threshold_value`	A numeric value between 0 and 1. Only used if `threshold_strategy` is "numeric".
`pos_class`	A character string, the label for the positive class.
`neg_class`	A character string, the label for the negative class.
`precomputed_prob`	Optional. A numeric vector of precomputed probabilities for the positive class. If provided, `model_obj` and `X_data` are not used for score derivation.
`y_original_numeric`	Optional. The original numeric/character vector of labels. If not provided, it's inferred from `y_data` using global `pos_label_value` and `neg_label_value`.

Value

A list containing:

sample_score: A data frame with sample (ID), label (original numeric), and score (predicted probability for positive class).
evaluation_metrics: A list of performance metrics:
- Threshold_Strategy: The strategy used for threshold selection.
- ⁠_Threshold⁠: The chosen probability threshold.
- Accuracy, Precision, Recall, F1, Specificity: Metrics calculated at ⁠_Threshold⁠.
- AUROC: Area Under the Receiver Operating Characteristic curve.
- AUROC_95CI_Lower, AUROC_95CI_Upper: 95% confidence interval for AUROC.
- AUPRC: Area Under the Precision-Recall curve.

Examples


set.seed(42)
n_obs <- 50
X_toy <- data.frame(
  FeatureA = rnorm(n_obs),
  FeatureB = runif(n_obs, 0, 100)
)
y_toy <- factor(sample(c("Control", "Case"), n_obs, replace = TRUE),
                levels = c("Control", "Case"))
ids_toy <- paste0("Sample", 1:n_obs)

# 2. Train a model
rf_model <- rf_dia(X_toy, y_toy)

# 3. Evaluate the model using F1-score optimal threshold
eval_results <- evaluate_model_dia(
  model_obj = rf_model,
  X_data = X_toy,
  y_data = y_toy,
  sample_ids = ids_toy,
  threshold_strategy = "f1",
  pos_class = "Case",
  neg_class = "Control"
)
str(eval_results)

E2E documentation built on Aug. 27, 2025, 1:09 a.m.