diss_evaluate: Evaluate dissimilarity matrices
In resemble: Similarity Retrieval and Local Learning for Spectral Chemometrics

diss_evaluate

R Documentation

Evaluate dissimilarity matrices

Description

\loadmathjax

Evaluates a dissimilarity matrix by comparing each observation to its nearest neighbor based on side information. For continuous variables, RMSD and correlation are computed; for categorical variables, the kappa index is used.

Usage

diss_evaluate(diss, side_info)

sim_eval(d, side_info)

Arguments

`diss`	A symmetric dissimilarity matrix. Alternatively, a vector containing the lower triangle values (as returned by `dist`).
`side_info`	A matrix of side information corresponding to the observations. Can be numeric (one or more columns) or character (single column for categorical data).
`d`	Deprecated. Use `diss` in `diss_evaluate()` instead.

Details

This function assesses whether a dissimilarity matrix captures meaningful structure by examining the side information of nearest neighbor pairs (Ramirez-Lopez et al., 2013). If observations that are similar in the dissimilarity space also have similar side information values, the dissimilarity is considered effective.

For numeric side_info, the root mean square of differences (RMSD) between each observation and its nearest neighbor is computed:

\mjdeqn

j(i) = NN(x_i, X^{-i})j(i) = NN(x_i, X^-i) \mjdeqnRMSD = \sqrt\frac1m \sum_i=1^m (y_i - y_j(i))^2RMSD = sqrt(1/m sum (y_i - y_j(i))^2)

where \mjeqnNN(x_i, X^-i)NN(x_i, X^-i) returns the index of the nearest neighbor of observation \mjeqnii (excluding itself), \mjeqny_iy_i is the side information value for observation \mjeqnii, and \mjeqnmm is the number of observations.

For categorical side_info, the kappa index is computed:

\mjdeqn\kappa

= \fracp_o - p_e1 - p_ekappa = (p_o - p_e) / (1 - p_e)

where \mjeqnp_op_o is the observed agreement and \mjeqnp_ep_e is the agreement expected by chance.

Value

A list with the following components:

eval: For numeric side information: a matrix with columns rmsd and r (correlation). For categorical: a matrix with column kappa.
global_eval: If multiple numeric side information variables are provided, summary statistics across variables.
first_nn: A matrix with the original side information and the side information of each observation's nearest neighbor.

Author(s)

Leonardo Ramirez-Lopez

References

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279.

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J.A.M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.

Examples


library(prospectr)
data(NIRsoil)

sg <- savitzkyGolay(NIRsoil$spc, p = 3, w = 11, m = 0)
NIRsoil$spc <- sg

Yr <- NIRsoil$Nt[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]

# Compute PCA-based dissimilarity
d <- dissimilarity(Xr, diss_method = diss_pca(ncomp = 8))

# Evaluate using side information
ev <- diss_evaluate(d$dissimilarity, side_info = as.matrix(Yr))
ev$eval

# Evaluate with multiple side information variables
Yr_2 <- NIRsoil$CEC[as.logical(NIRsoil$train)]
ev_2 <- diss_evaluate(d$dissimilarity, side_info = cbind(Yr, Yr_2))
ev_2$eval
ev_2$global_eval

resemble documentation built on April 21, 2026, 1:07 a.m.