audit_importance: Audit Feature Importance Calculations

View source: R/borg_audit.R

audit_importanceR Documentation

Audit Feature Importance Calculations

Description

Detects when feature importance (SHAP, permutation importance, etc.) is computed using test data, which can lead to biased feature selection and data leakage.

Usage

audit_importance(
  importance,
  data,
  train_idx,
  test_idx,
  method = "auto",
  model = NULL
)

Arguments

importance

A vector, matrix, or data frame of importance values.

data

The data used to compute importance.

train_idx

Integer vector of training indices.

test_idx

Integer vector of test indices.

method

Character indicating the importance method. One of "shap", "permutation", "gain", "impurity", or "auto" (default).

model

Optional fitted model object for additional validation.

Details

Feature importance computed on test data is a form of data leakage because:

  • SHAP values computed on test data reveal test set structure

  • Permutation importance on test data uses test labels

  • Feature selection based on test importance leads to overfit models

This function checks if the data used for importance calculation includes test indices and flags potential violations.

Value

A BorgRisk object with audit results.

Examples

set.seed(42)
data <- data.frame(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100))
train_idx <- 1:70
test_idx <- 71:100

# Simulate importance values
importance <- c(x1 = 0.6, x2 = 0.4)

# Good: importance computed on training data
result <- audit_importance(importance, data[train_idx, ], train_idx, test_idx)

# Bad: importance computed on full data (includes test)
result_bad <- audit_importance(importance, data, train_idx, test_idx)


BORG documentation built on March 20, 2026, 5:09 p.m.