| audit_importance | R Documentation |
Detects when feature importance (SHAP, permutation importance, etc.) is computed using test data, which can lead to biased feature selection and data leakage.
audit_importance(
importance,
data,
train_idx,
test_idx,
method = "auto",
model = NULL
)
importance |
A vector, matrix, or data frame of importance values. |
data |
The data used to compute importance. |
train_idx |
Integer vector of training indices. |
test_idx |
Integer vector of test indices. |
method |
Character indicating the importance method. One of "shap", "permutation", "gain", "impurity", or "auto" (default). |
model |
Optional fitted model object for additional validation. |
Feature importance computed on test data is a form of data leakage because:
SHAP values computed on test data reveal test set structure
Permutation importance on test data uses test labels
Feature selection based on test importance leads to overfit models
This function checks if the data used for importance calculation includes test indices and flags potential violations.
A BorgRisk object with audit results.
set.seed(42)
data <- data.frame(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100))
train_idx <- 1:70
test_idx <- 71:100
# Simulate importance values
importance <- c(x1 = 0.6, x2 = 0.4)
# Good: importance computed on training data
result <- audit_importance(importance, data[train_idx, ], train_idx, test_idx)
# Bad: importance computed on full data (includes test)
result_bad <- audit_importance(importance, data, train_idx, test_idx)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.