| ndist | R Documentation |
Computes a distance matrix for continuous data with support for multiple distance metrics, scaling methods, dimensionality reduction, and validation data. The function implements various distance calculation approaches as described in van de Velden et al. (2024), including options for commensurable distances and variable weighting.
ndist(x, validate_x = NULL, commensurable = FALSE, method = "manhattan",
sig = NULL, scaling = "none", ncomp = ncol(x), threshold = NULL,
weights = rep(1, ncol(x)))
x |
A data frame or matrix of continuous input variables. |
validate_x |
Optional data frame or matrix for validation data. If provided, distances are computed between observations in |
commensurable |
Logical. If |
method |
Character string specifying the distance metric. Options include |
sig |
Covariance matrix to be used when |
scaling |
Character string specifying the scaling method. Options:
Default is |
ncomp |
Number of principal components to retain when |
threshold |
Proportion of variance to retain when |
weights |
Numeric vector of weights for each variable. Must have length equal to the number of variables in |
The ndist function provides a comprehensive framework for distance calculations in continuous data:
When validate_x is provided, computes distances between observations in validate_x and x.
Supports multiple scaling methods that can be applied before distance calculation.
PCA-based dimensionality reduction can be controlled either by number of components or variance threshold.
For Mahalanobis distance, handles singular covariance matrices with appropriate error messages.
Implements commensurable distances for better comparability across variables.
Warning: The function validates:
Weight vector length must match the number of variables
Covariance matrix singularity for Mahalanobis distance
Compatibility of x and validate_x dimensions
A distance matrix where element [i,j] represents the distance between:
observation i and j of x if validate_x is NULL
observation i of validate_x and observation j of x if validate_x is provided
van de Velden, M., Iodice D'Enza, A., Markos, A., Cavicchia, C. (2024). (Un)biased distances for mixed-type data. arXiv preprint. Retrieved from https://arxiv.org/abs/2411.00429.
mdist for mixed-type data distances, cdist for categorical data distances.
library(palmerpenguins)
library(rsample)
penguins_cont <- palmerpenguins::penguins[, c("bill_length_mm",
"bill_depth_mm", "flipper_length_mm", "body_mass_g")]
penguins_cont <- penguins_cont[complete.cases(penguins_cont), ]
# Basic usage
dist_matrix <- ndist(penguins_cont)
# Commensurable distances with standardization
dist_matrix <- ndist(penguins_cont,
commensurable = TRUE,
scaling = "std")
# PCA-based dimensionality reduction
dist_matrix <- ndist(penguins_cont,
scaling = "pc_scores",
threshold = 0.95)
# Mahalanobis distance
dist_matrix <- ndist(penguins_cont,
method = "mahalanobis")
# Weighted Euclidean distance
dist_matrix <- ndist(penguins_cont,
method = "euclidean",
weights = c(1, 0.5, 2, 1))
# Training-test split example with validation data
set.seed(123)
# Create training-test split using rsample
penguins_split <- initial_split(penguins_cont, prop = 0.8)
tr_penguins <- training(penguins_split)
ts_penguins <- testing(penguins_split)
# Basic usage with training data only
dist_matrix <- ndist(tr_penguins)
# Computing distances between test and training sets
val_dist_matrix <- ndist(x = tr_penguins,
validate_x = ts_penguins,
method = "euclidean")
# Using validation data with standardization
val_dist_matrix_std <- ndist(x = tr_penguins,
validate_x = ts_penguins,
scaling = "std",
method = "manhattan")
# Validation with PCA and commensurability
val_dist_matrix_pca <- ndist(x = tr_penguins,
validate_x = ts_penguins,
scaling = "pc_scores",
ncomp = 2,
commensurable = TRUE)
# Validation with robust scaling and custom weights
val_dist_matrix_robust <- ndist(x = tr_penguins,
validate_x = ts_penguins,
scaling = "robust",
weights = c(1, 0.5, 2, 1))
# Mahalanobis distance with validation data
val_dist_matrix_mahal <- ndist(x = tr_penguins,
validate_x = ts_penguins,
method = "mahalanobis")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.