| loi | R Documentation |
Computes the LoI index and its decomposition, measuring how well the E2Tree-estimated proximity matrix reconstructs the original ensemble proximity matrix.
loi(O, O_hat, normalize = TRUE)
O |
Proximity matrix from the ensemble model (n x n), values in the interval 0 to 1 |
O_hat |
Proximity matrix estimated by E2Tree (n x n), values in the interval 0 to 1 |
normalize |
Logical. If TRUE (default), returns nLoI (divided by M). If FALSE, returns raw LoI. |
The statistic is defined as:
\mathrm{LoI}(O, \hat{O}) = \sum_{i < j}
\frac{(o_{ij} - \hat{o}_{ij})^2}{\max(o_{ij}, \hat{o}_{ij})}
The Normalized LoI divides by the number of pairs M = n(n-1)/2:
\mathrm{nLoI}(O, \hat{O}) = \frac{1}{M} \mathrm{LoI}(O, \hat{O})
The LoI decomposes into two components:
LoI_in: within-node loss (pairs grouped together by E2Tree)
LoI_out: between-node loss (pairs separated by E2Tree)
The per-pair averages mean_in and mean_out enable direct
comparison between the two components despite their different pair counts.
The statistic uses a normalized squared difference, where each cell's contribution is weighted by the maximum of the two proximity values. This gives more weight to discrepancies in high-proximity regions.
Decomposition interpretation (per-pair averages):
mean_out: average ensemble proximity lost by the partition.
Low values (< 0.1) indicate the tree correctly separates low-proximity
pairs. High values (> 0.3) suggest the tree splits apart pairs that
the ensemble considers similar –more terminal nodes may help.
mean_in: average calibration error within nodes. Low values
(< 0.01) indicate excellent within-node reconstruction. Higher values
reflect the inherent fuzzy-to-crisp transition.
An object of class "loi" containing:
loi |
Raw LoI value (unnormalized) |
nloi |
Normalized LoI (LoI / M) |
loi_in |
Within-node component (total) |
loi_out |
Between-node component (total) |
mean_in |
Per-pair average within-node loss (comparable with mean_out) |
mean_out |
Per-pair average between-node loss (comparable with mean_in) |
n |
Matrix dimension |
m |
Number of unique pairs |
n_within |
Number of within-node pairs |
n_between |
Number of between-node pairs |
data(iris)
smp_size <- floor(0.75 * nrow(iris))
set.seed(42)
train_ind <- sample(seq_len(nrow(iris)), size = smp_size)
training <- iris[train_ind, ]
ensemble <- randomForest::randomForest(Species ~ ., data = training,
importance = TRUE, proximity = TRUE)
D <- createDisMatrix(ensemble, data = training, label = "Species",
parallel = list(active = FALSE, no_cores = 1))
setting <- list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5)
tree <- e2tree(Species ~ ., training, D, ensemble, setting)
vs <- eValidation(training, tree, D)
prox <- proximity(vs)
O <- prox$ensemble
O_hat <- prox$e2tree
# Compute LoI with decomposition
result <- loi(O, O_hat)
print(result)
summary(result)
plot(result)
# Permutation test
perm <- loi_perm(O, O_hat, n_perm = 999, seed = 42)
print(perm)
plot(perm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.