View source: R/matching_distance_cache.R
| compute_distances | R Documentation |
Precomputes a distance matrix between left and right datasets, allowing it to be reused across multiple matching operations with different constraints. This is particularly useful when exploring different matching parameters (max_distance, calipers, methods) without recomputing distances.
compute_distances(
left,
right,
vars,
distance = "euclidean",
weights = NULL,
scale = FALSE,
auto_scale = FALSE,
left_id = "id",
right_id = "id",
block_id = NULL
)
left |
Left dataset (data frame) |
right |
Right dataset (data frame) |
vars |
Character vector of variable names to use for distance computation |
distance |
Distance metric (default: "euclidean") |
weights |
Optional numeric vector of variable weights |
scale |
Scaling method: FALSE, "standardize", "range", or "robust" |
auto_scale |
Apply automatic preprocessing (default: FALSE) |
left_id |
Name of ID column in left (default: "id") |
right_id |
Name of ID column in right (default: "id") |
block_id |
Optional block ID column name for blocked matching |
This function computes distances once and stores them in a reusable object.
The resulting distance_object can be passed to match_couples() or
greedy_couples() instead of providing datasets and variables.
Benefits:
Performance: Avoid recomputing distances when trying different constraints
Exploration: Quickly test max_distance, calipers, or methods
Consistency: Ensures same distances used across comparisons
Memory efficient: Can use sparse matrices when many pairs are forbidden
The distance_object stores the original datasets, allowing downstream
functions like join_matched() to work seamlessly.
An S3 object of class "distance_object" containing:
cost_matrix: Numeric matrix of distances
left_ids: Character vector of left IDs
right_ids: Character vector of right IDs
block_id: Block ID column name (if specified)
metadata: List with computation details (vars, distance, scale, etc.)
original_left: Original left dataset (for later joining)
original_right: Original right dataset (for later joining)
# Compute distances once
left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45), income = c(45, 52, 48, 61, 55) * 1000)
right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44), income = c(46, 51, 47, 60, 54) * 1000)
dist_obj <- compute_distances(
left, right,
vars = c("age", "income"),
scale = "standardize"
)
# Reuse for different matching strategies
result1 <- match_couples(dist_obj, max_distance = 0.5)
result2 <- match_couples(dist_obj, max_distance = 1.0)
result3 <- greedy_couples(dist_obj, strategy = "sorted")
# All use the same precomputed distances
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.