compute_distances: Compute and Cache Distance Matrix for Reuse
In couplr: Optimal Pairing and Matching via Linear Assignment

View source: R/matching_distance_cache.R

compute_distances

R Documentation

Compute and Cache Distance Matrix for Reuse

Description

Precomputes a distance matrix between left and right datasets, allowing it to be reused across multiple matching operations with different constraints. This is particularly useful when exploring different matching parameters (max_distance, calipers, methods) without recomputing distances.

Usage

compute_distances(
  left,
  right,
  vars,
  distance = "euclidean",
  weights = NULL,
  scale = FALSE,
  auto_scale = FALSE,
  left_id = "id",
  right_id = "id",
  block_id = NULL
)

Arguments

`left`	Left dataset (data frame)
`right`	Right dataset (data frame)
`vars`	Character vector of variable names to use for distance computation
`distance`	Distance metric (default: "euclidean")
`weights`	Optional numeric vector of variable weights
`scale`	Scaling method: FALSE, "standardize", "range", or "robust"
`auto_scale`	Apply automatic preprocessing (default: FALSE)
`left_id`	Name of ID column in left (default: "id")
`right_id`	Name of ID column in right (default: "id")
`block_id`	Optional block ID column name for blocked matching

Details

This function computes distances once and stores them in a reusable object. The resulting distance_object can be passed to match_couples() or greedy_couples() instead of providing datasets and variables.

Benefits:

Performance: Avoid recomputing distances when trying different constraints
Exploration: Quickly test max_distance, calipers, or methods
Consistency: Ensures same distances used across comparisons
Memory efficient: Can use sparse matrices when many pairs are forbidden

The distance_object stores the original datasets, allowing downstream functions like join_matched() to work seamlessly.

Value

An S3 object of class "distance_object" containing:

cost_matrix: Numeric matrix of distances
left_ids: Character vector of left IDs
right_ids: Character vector of right IDs
block_id: Block ID column name (if specified)
metadata: List with computation details (vars, distance, scale, etc.)
original_left: Original left dataset (for later joining)
original_right: Original right dataset (for later joining)

Examples

# Compute distances once
left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45), income = c(45, 52, 48, 61, 55) * 1000)
right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44), income = c(46, 51, 47, 60, 54) * 1000)

dist_obj <- compute_distances(
  left, right,
  vars = c("age", "income"),
  scale = "standardize"
)

# Reuse for different matching strategies
result1 <- match_couples(dist_obj, max_distance = 0.5)
result2 <- match_couples(dist_obj, max_distance = 1.0)
result3 <- greedy_couples(dist_obj, strategy = "sorted")

# All use the same precomputed distances

couplr documentation built on March 3, 2026, 5:08 p.m.