feature_importance.classifier: Evaluate Feature Importance for a Classifier
In bbuchsbaum/multivarious: Extensible Data Structures for Multivariate Analysis

feature_importance.classifier

R Documentation

Evaluate Feature Importance for a Classifier

Description

Estimates the importance of features or blocks of features for the classification performance using either a "marginal" (leave-one-block-out) or "standalone" (use-only-one-block) approach.

Usage

## S3 method for class 'classifier'
feature_importance(
  x,
  new_data,
  true_labels,
  ncomp = NULL,
  blocks = NULL,
  metric = c("cosine", "euclidean", "ejaccard"),
  fun = rank_score,
  fun_direction = c("lower_is_better", "higher_is_better"),
  approach = c("marginal", "standalone"),
  ...
)

Arguments

`x`	A fitted `classifier` object.
`new_data`	The data matrix used for evaluating importance (typically validation or test data).
`true_labels`	The true class labels corresponding to the rows of `new_data`.
`ncomp`	Optional integer; the number of components to use from the projector for classification (default: all components used during classifier creation).
`blocks`	A list where each element is a numeric vector of feature indices (columns in the original data space) defining a block. If `NULL`, each feature is treated as its own block.
`metric`	Character string specifying the similarity or distance metric for k-NN. Choices: "euclidean", "cosine", "ejaccard".
`fun`	A function to compute the performance metric (e.g., `rank_score`, `topk`, or a custom function). The function should take a probability matrix and observed labels and return a data frame where the first column is the metric value per observation.
`fun_direction`	Character string, either "lower_is_better" or "higher_is_better", indicating whether lower or higher values of the metric calculated by `fun` signify better performance. This is used to interpret the importance score correctly.
`approach`	Character string: "marginal" (calculates importance as change from baseline when block is removed) or "standalone" (calculates importance as performance using only the block).
`...`	Additional arguments passed to `predict.classifier` during internal predictions.

Details

Importance is measured by the change in a performance metric (fun) when features are removed (marginal) or used exclusively (standalone).

Value

A data.frame with columns block (character representation of feature indices in the block) and importance (numeric importance score). Higher importance values generally indicate more influential blocks, considering fun_direction.

Examples

# Assume clf is a fitted classifier object, dat is new data, true_lbls are correct labels for dat
# Assume blocks_list defines feature groups e.g., list(1:5, 6:10)
# feature_importance(clf, new_data = dat, true_labels = true_lbls, blocks = blocks_list)

bbuchsbaum/multivarious documentation built on July 16, 2025, 11:04 p.m.