loi: Loss of Interpretability (LoI) Index

View source: R/loi.R

loiR Documentation

Loss of Interpretability (LoI) Index

Description

Computes the LoI index and its decomposition, measuring how well the E2Tree-estimated proximity matrix reconstructs the original ensemble proximity matrix.

Usage

loi(O, O_hat, normalize = TRUE)

Arguments

O

Proximity matrix from the ensemble model (n x n), values in the interval 0 to 1

O_hat

Proximity matrix estimated by E2Tree (n x n), values in the interval 0 to 1

normalize

Logical. If TRUE (default), returns nLoI (divided by M). If FALSE, returns raw LoI.

Details

The statistic is defined as:

\mathrm{LoI}(O, \hat{O}) = \sum_{i < j} \frac{(o_{ij} - \hat{o}_{ij})^2}{\max(o_{ij}, \hat{o}_{ij})}

The Normalized LoI divides by the number of pairs M = n(n-1)/2:

\mathrm{nLoI}(O, \hat{O}) = \frac{1}{M} \mathrm{LoI}(O, \hat{O})

The LoI decomposes into two components:

  • LoI_in: within-node loss (pairs grouped together by E2Tree)

  • LoI_out: between-node loss (pairs separated by E2Tree)

The per-pair averages mean_in and mean_out enable direct comparison between the two components despite their different pair counts.

The statistic uses a normalized squared difference, where each cell's contribution is weighted by the maximum of the two proximity values. This gives more weight to discrepancies in high-proximity regions.

Decomposition interpretation (per-pair averages):

  • mean_out: average ensemble proximity lost by the partition. Low values (< 0.1) indicate the tree correctly separates low-proximity pairs. High values (> 0.3) suggest the tree splits apart pairs that the ensemble considers similar –more terminal nodes may help.

  • mean_in: average calibration error within nodes. Low values (< 0.01) indicate excellent within-node reconstruction. Higher values reflect the inherent fuzzy-to-crisp transition.

Value

An object of class "loi" containing:

loi

Raw LoI value (unnormalized)

nloi

Normalized LoI (LoI / M)

loi_in

Within-node component (total)

loi_out

Between-node component (total)

mean_in

Per-pair average within-node loss (comparable with mean_out)

mean_out

Per-pair average between-node loss (comparable with mean_in)

n

Matrix dimension

m

Number of unique pairs

n_within

Number of within-node pairs

n_between

Number of between-node pairs

Examples


data(iris)
smp_size <- floor(0.75 * nrow(iris))
set.seed(42)
train_ind <- sample(seq_len(nrow(iris)), size = smp_size)
training <- iris[train_ind, ]

ensemble <- randomForest::randomForest(Species ~ ., data = training,
  importance = TRUE, proximity = TRUE)

D <- createDisMatrix(ensemble, data = training, label = "Species",
  parallel = list(active = FALSE, no_cores = 1))

setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5)
tree <- e2tree(Species ~ ., training, D, ensemble, setting)

vs <- eValidation(training, tree, D)
prox <- proximity(vs)
O <- prox$ensemble
O_hat <- prox$e2tree

# Compute LoI with decomposition
result <- loi(O, O_hat)
print(result)
summary(result)
plot(result)

# Permutation test
perm <- loi_perm(O, O_hat, n_perm = 999, seed = 42)
print(perm)
plot(perm)



e2tree documentation built on May 15, 2026, 5:06 p.m.