dissimilarity: Compute dissimilarity matrices

View source: R/dissimilarity.R

dissimilarityR Documentation

Compute dissimilarity matrices

Description

\loadmathjax

Computes dissimilarity matrices between observations using various methods. This is the main interface for dissimilarity computation in the resemble package.

Usage

dissimilarity(Xr, Xu = NULL, diss_method = diss_pca(), Yr = NULL)

Arguments

Xr

A numeric matrix of reference observations (rows) and variables (columns).

Xu

Optional matrix of additional observations with the same variables.

diss_method

A dissimilarity method object created by one of:

  • diss_pca(): Mahalanobis distance in PCA space

  • diss_pls(): Mahalanobis distance in PLS space

  • diss_correlation(): Correlation-based dissimilarity

  • diss_euclidean(): Euclidean distance

  • diss_mahalanobis(): Mahalanobis distance

  • diss_cosine(): Cosine dissimilarity

Default is diss_pca().

Yr

Optional response matrix. Required for PLS methods and when using ncomp_by_opc().

Details

The function dispatches to the appropriate internal computation based on the class of diss_method. Each method constructor (e.g., diss_pca()) encapsulates all method-specific parameters including component selection, centering, scaling, and whether to return projections.

Output dimensions

When only Xr is provided, the function computes pairwise dissimilarities among all observations in Xr, returning a symmetric nrow(Xr) \mjeqn\timesx nrow(Xr) matrix.

When both Xr and Xu are provided, the function computes dissimilarities between each observation in Xr and each observation in Xu, returning a nrow(Xr) \mjeqn\timesx nrow(Xu) matrix where element \mjeqn(i, j)(i, j) is the dissimilarity between the \mjeqnii-th observation in Xr and the \mjeqnjj-th observation in Xu.

Mahalanobis distance

Note that diss_mahalanobis() computes Mahalanobis distance directly on the input variables. This requires the covariance matrix to be invertible, which fails when the number of variables exceeds the number of observations or when variables are highly correlated (common in spectral data). For such cases, use diss_pca() or diss_pls() instead.

Value

A list of class "dissimilarity" containing:

dissimilarity

The computed dissimilarity matrix. Dimensions are nrow(Xr) \mjeqn\timesx nrow(Xr) when Xu = NULL, or nrow(Xr) \mjeqn\timesx nrow(Xu) otherwise.

diss_method

The diss_* constructor object used for computation.

center

Vector used to center the data.

scale

Vector used to scale the data.

ncomp

Number of components used (for projection methods).

projection

If return_projection = TRUE in the method constructor, the ortho_projection object.

Author(s)

Leonardo Ramirez-Lopez

References

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J.A.M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.

See Also

diss_pca, diss_pls, diss_correlation, diss_euclidean, diss_mahalanobis, diss_cosine

Examples


library(prospectr)
data(NIRsoil)

# Preprocess
sg <- savitzkyGolay(NIRsoil$spc, m = 1, p = 4, w = 15)

Xr <- sg[as.logical(NIRsoil$train), ]
Xu <- sg[!as.logical(NIRsoil$train), ]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]

Xu <- Xu[!is.na(Yu), ]
Xr <- Xr[!is.na(Yr), ]
Yr <- Yr[!is.na(Yr)]

# PCA-based dissimilarity with variance-based selection
d1 <- dissimilarity(Xr, Xu, diss_method = diss_pca())

# PCA with OPC selection (requires Yr)
d2 <- dissimilarity(Xr, Xu,
  Yr = Yr,
  diss_method = diss_pca(
    ncomp = ncomp_by_opc(30),
    return_projection = TRUE
  )
)

# PLS-based dissimilarity 
d3 <- dissimilarity(
  Xr, Xu,
  Yr = Yr,
  diss_method = diss_pls(
    ncomp = ncomp_by_opc(30)
  )
)

# Euclidean distance
d4 <- dissimilarity(Xr, Xu, diss_method = diss_euclidean())

# Correlation dissimilarity with moving window
d5 <- dissimilarity(Xr, Xu, diss_method = diss_correlation(ws = 41))

# Mahalanobis distance (use only when n > p and low collinearity)
# d6 <- dissimilarity(Xr[, 1:20], Xu[, 1:20],
#                     diss_method = diss_mahalanobis())



resemble documentation built on April 21, 2026, 1:07 a.m.