View source: R/dist_continuous.R
| dist_continuous | R Documentation |
Internal helper function to compute pairwise distance matrices for purely numeric datasets. Supports standard metrics, including Euclidean, Manhattan, Chebyshev, Canberra, Minkowski, standardized Euclidean, and Mahalanobis distances.
dist_continuous(x, method, p = NULL)
x |
A numeric data frame or matrix with rows as observations and columns as variables. |
method |
Distance metric to compute (see details for supported options). |
p |
Numeric, the power parameter for Minkowski distance (required if |
Supported methods and formulas (for observations \mathbf{z}_i and \mathbf{z}_j):
"euclidean":
\delta_E(i,j) = \sqrt{\sum_{k=1}^{p} (z_{ik} - z_{jk})^2}
"minkowski":
\delta_q(i,j) = \left( \sum_{k=1}^{p} |z_{ik} - z_{jk}|^q \right)^{1/q}
requires p = q
"manhattan":
\delta_1(i,j) = \sum_{k=1}^{p} |z_{ik} - z_{jk}|
"maximum":
\delta_\infty(i,j) = \max_k |z_{ik} - z_{jk}|
"canberra":
\delta_C(i,j) = \sum_{k=1}^{p} \frac{|z_{ik} - z_{jk}|}{|z_{ik}| + |z_{jk}|}
convention: 0/0 := 0
"euclidean_standardized":
\delta_K(i,j) = \sqrt{\sum_{k=1}^{p} \frac{(z_{ik} - z_{jk})^2}{s_k^2}}
s_k^2 is the variance of variable k
"mahalanobis":
\delta_M(i,j) = \sqrt{ (\mathbf{z}_i - \mathbf{z}_j)' \mathbf{S}^{-1} (\mathbf{z}_i - \mathbf{z}_j) }
\mathbf{S} is the covariance matrix
Considerations when choosing a distance metric:
For "euclidean_standardized", columns are standardized to mean 0 and variance 1 before
computing Euclidean distances.
Cosine and correlation distances rely on the proxy package; these are not guaranteed to be strictly Euclidean.
Minkowski distance requires specifying the parameter p (e.g., p = 3 for L3 norm).
Mahalanobis distance uses the inverse of the covariance matrix. If the covariance matrix is singular, the generalized inverse from MASS::ginv is used.
Standard metrics (Euclidean, Manhattan, Maximum, Canberra) are computed using stats::dist.
A symmetric numeric matrix of pairwise distances between rows of x.
The diagonal contains zeros.
# Small numeric matrix
mat <- matrix(c(1, 2, 3,
4, 5, 6,
7, 8, 9), nrow = 3, byrow = TRUE)
# Euclidean distance
dbrobust::dist_continuous(mat, method = "euclidean")
# Standardized Euclidean
dbrobust::dist_continuous(mat, method = "euclidean_standardized")
# Minkowski distance with p = 3
dbrobust::dist_continuous(mat, method = "minkowski", p = 3)
# Mahalanobis distance
set.seed(123)
mat <- matrix(rnorm(5*3), nrow = 5, ncol = 3)
colnames(mat) <- c("X1","X2","X3")
# Compute the mahalanobis distance
dbrobust::dist_continuous(mat, method = "mahalanobis")
# Cosine distance (requires 'proxy' package)
dbrobust::dist_continuous(mat, method = "cosine")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.