imbalance_dia: Train an EasyEnsemble Model for Imbalanced Classification

View source: R/diagnosis.R

imbalance_diaR Documentation

Train an EasyEnsemble Model for Imbalanced Classification

Description

Implements the EasyEnsemble algorithm. It trains multiple base models on balanced subsets of the data (by undersampling the majority class) and aggregates their predictions.

Usage

imbalance_dia(
  data,
  base_model_name = "xb",
  n_estimators = 10,
  tune_base_model = FALSE,
  threshold_choices = "default",
  positive_label_value = 1,
  negative_label_value = 0,
  new_positive_label = "Positive",
  new_negative_label = "Negative",
  seed = 456
)

Arguments

data

A data frame where the first column is the sample ID, the second is the outcome label, and subsequent columns are features.

base_model_name

A character string, the name of the base diagnostic model to use (e.g., "xb", "rf"). This model must be registered.

n_estimators

An integer, the number of base models to train (number of subsets).

tune_base_model

Logical, whether to enable tuning for each base model.

threshold_choices

A character string (e.g., "f1", "youden", "default") or a numeric value (0-1) for determining the evaluation threshold for the ensemble.

positive_label_value

A numeric or character value in the raw data representing the positive class.

negative_label_value

A numeric or character value in the raw data representing the negative class.

new_positive_label

A character string, the desired factor level name for the positive class (e.g., "Positive").

new_negative_label

A character string, the desired factor level name for the negative class (e.g., "Negative").

seed

An integer, for reproducibility.

Value

A list containing the model_object, sample_score, and evaluation_metrics.

See Also

initialize_modeling_system_dia, evaluate_model_dia

Examples


# 1. Initialize the modeling system
initialize_modeling_system_dia()

# 2. Create an imbalanced toy dataset
set.seed(42)
n_obs <- 100
n_minority <- 10
data_imbalanced_toy <- data.frame(
  ID = paste0("Sample", 1:n_obs),
  Status = c(rep(1, n_minority), rep(0, n_obs - n_minority)),
  Feat1 = rnorm(n_obs),
  Feat2 = runif(n_obs)
)

# 3. Run the EasyEnsemble algorithm
# n_estimators is reduced for a quick example
easyensemble_results <- imbalance_dia(
  data = data_imbalanced_toy,
  base_model_name = "xb",
  n_estimators = 3,
  threshold_choices = "f1"
)
print_model_summary_dia("EasyEnsemble (XGBoost)", easyensemble_results)


E2E documentation built on Aug. 27, 2025, 1:09 a.m.