View source: R/calculate_distances.R
| calculate_distances | R Documentation |
Computes a distance or similarity matrix between rows of a data frame or matrix, supporting a wide variety of distance metrics.
calculate_distances(
x,
method = "gower",
output_format = "dist",
squared = FALSE,
p = NULL,
similarity_transform = "linear",
...
)
x |
A matrix or data.frame. Each row represents an observation. |
method |
A string specifying the distance/similarity method. Supported:
|
output_format |
Output format: |
squared |
Logical; if |
p |
Numeric; the power parameter for the Minkowski distance (required if |
similarity_transform |
Character string; if
|
... |
Additional arguments passed to underlying functions. |
When output_format = "similarity", the function transforms computed distances into similarity scores using one of the supported transformations.
The similarity transformation options are:
"linear"Direct inversion of distance: s_{ij} = 1 - \delta_{ij}.
"sqrt"Squared distance inversion: s_{ij} = 1 - \delta_{ij}^2, which may better preserve Euclidean properties.
Depending on output_format, returns:
dist object (if output_format = "dist")
numeric matrix (if output_format = "matrix" or "output_format = similarity")
dist for basic distance measures,
dist.binary for binary distances,
dist for advanced metrics like cosine or correlation
# Load example dataset
data("Data_HC_contamination", package = "dbrobust")
df <- Data_HC_contamination
# --- Quick Example ---
numeric_data <- df[1:10, 1:4] # subset for speed
d_euclid <- calculate_distances(
numeric_data,
method = "euclidean",
output_format = "matrix"
)
# Load example dataset
data("Data_HC_contamination", package = "dbrobust")
df <- Data_HC_contamination[1:20,]
# Example 1: Euclidean distance (numeric variables only)
numeric_data <- df[, 1:4]
d_euclid <- calculate_distances(
numeric_data,
method = "euclidean",
output_format = "matrix"
)
# Example 2: Manhattan distance
d_manhattan <- calculate_distances(
numeric_data,
method = "manhattan",
output_format = "matrix"
)
# Example 3: Categorical distance using Matching Coefficient
categorical_data <- df[, 5:7]
d_match <- calculate_distances(
categorical_data,
method = "matching_coefficient",
output_format = "matrix"
)
# Example 4: Mixed data distance using Gower (automatic type detection, asymmetric binary)
d_gower_asym <- calculate_distances(
df,
method = "gower",
output_format = "dist",
binary_asym = TRUE
)
# Example 5: Minkowski distance with p = 3
d_minkowski <- calculate_distances(
numeric_data,
method = "minkowski",
p = 3,
output_format = "matrix"
)
# Example 6: Jaccard distance for binary variables
binary_data <- df[, 8:9]
d_jaccard <- calculate_distances(
binary_data,
method = "jaccard",
output_format = "matrix"
)
# Example 7: Mahalanobis distance
d_mahal <- calculate_distances(
numeric_data,
method = "mahalanobis",
output_format = "matrix"
)
# Example 8: Manual selection of variables for Gower distance
continuous_vars <- 1:4
binary_vars <- 8:9
categorical_vars <- 5:7
d_gower_manual <- calculate_distances(
df,
method = "gower",
output_format = "dist",
continuous_cols = continuous_vars,
binary_cols = binary_vars,
categorical_cols = categorical_vars
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.