View source: R/diss_evaluate.R
| diss_evaluate | R Documentation |
Evaluates a dissimilarity matrix by comparing each observation to its nearest neighbor based on side information. For continuous variables, RMSD and correlation are computed; for categorical variables, the kappa index is used.
diss_evaluate(diss, side_info)
sim_eval(d, side_info)
diss |
A symmetric dissimilarity matrix. Alternatively, a vector
containing the lower triangle values (as returned by |
side_info |
A matrix of side information corresponding to the observations. Can be numeric (one or more columns) or character (single column for categorical data). |
d |
Deprecated. Use |
This function assesses whether a dissimilarity matrix captures meaningful structure by examining the side information of nearest neighbor pairs (Ramirez-Lopez et al., 2013). If observations that are similar in the dissimilarity space also have similar side information values, the dissimilarity is considered effective.
For numeric side_info, the root mean square of differences (RMSD)
between each observation and its nearest neighbor is computed:
j(i) = NN(x_i, X^{-i})j(i) = NN(x_i, X^-i) \mjdeqnRMSD = \sqrt\frac1m \sum_i=1^m (y_i - y_j(i))^2RMSD = sqrt(1/m sum (y_i - y_j(i))^2)
where \mjeqnNN(x_i, X^-i)NN(x_i, X^-i) returns the index of the nearest neighbor of observation \mjeqnii (excluding itself), \mjeqny_iy_i is the side information value for observation \mjeqnii, and \mjeqnmm is the number of observations.
For categorical side_info, the kappa index is computed:
= \fracp_o - p_e1 - p_ekappa = (p_o - p_e) / (1 - p_e)
where \mjeqnp_op_o is the observed agreement and \mjeqnp_ep_e is the agreement expected by chance.
A list with the following components:
For numeric side information: a matrix with columns
rmsd and r (correlation). For categorical: a matrix
with column kappa.
If multiple numeric side information variables are provided, summary statistics across variables.
A matrix with the original side information and the side information of each observation's nearest neighbor.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J.A.M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
dissimilarity, ncomp_by_opc
library(prospectr)
data(NIRsoil)
sg <- savitzkyGolay(NIRsoil$spc, p = 3, w = 11, m = 0)
NIRsoil$spc <- sg
Yr <- NIRsoil$Nt[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
# Compute PCA-based dissimilarity
d <- dissimilarity(Xr, diss_method = diss_pca(ncomp = 8))
# Evaluate using side information
ev <- diss_evaluate(d$dissimilarity, side_info = as.matrix(Yr))
ev$eval
# Evaluate with multiple side information variables
Yr_2 <- NIRsoil$CEC[as.logical(NIRsoil$train)]
ev_2 <- diss_evaluate(d$dissimilarity, side_info = cbind(Yr, Yr_2))
ev_2$eval
ev_2$global_eval
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.