join_matched: Join Matched Pairs with Original Data
In couplr: Optimal Pairing and Matching via Linear Assignment

join_matched

R Documentation

Join Matched Pairs with Original Data

Description

Creates an analysis-ready dataset by joining matched pairs with variables from the original left and right datasets. This eliminates the need for manual joins and provides a convenient format for downstream analysis.

Usage

join_matched(
  result,
  left,
  right,
  left_vars = NULL,
  right_vars = NULL,
  left_id = "id",
  right_id = "id",
  suffix = c("_left", "_right"),
  include_distance = TRUE,
  include_pair_id = TRUE,
  include_block_id = TRUE
)

Arguments

`result`	A matching_result object from `match_couples()` or `greedy_couples()`
`left`	The original left dataset
`right`	The original right dataset
`left_vars`	Character vector of variable names to include from left. If NULL (default), includes all variables except the ID column.
`right_vars`	Character vector of variable names to include from right. If NULL (default), includes all variables except the ID column.
`left_id`	Name of the ID column in left dataset (default: "id")
`right_id`	Name of the ID column in right dataset (default: "id")
`suffix`	Character vector of length 2 specifying suffixes for left and right variables (default: c("_left", "_right"))
`include_distance`	Include the matching distance in output (default: TRUE)
`include_pair_id`	Include pair_id column (default: TRUE)
`include_block_id`	Include block_id if blocking was used (default: TRUE)

Details

This function simplifies the common workflow of joining matched pairs with original data. Instead of manually merging result$pairs with left and right datasets, join_matched() handles the joins automatically and applies consistent naming conventions.

When variables appear in both left and right datasets, suffixes are appended to distinguish them (e.g., "age_left" and "age_right"). This makes it easy to compute differences or use both values in models.

Value

A tibble with one row per matched pair, containing:

pair_id: Sequential pair identifier (if include_pair_id = TRUE)
left_id: ID from left dataset
right_id: ID from right dataset
distance: Matching distance (if include_distance = TRUE)
block_id: Block identifier (if blocking used and include_block_id = TRUE)
Variables from left dataset (with left suffix)
Variables from right dataset (with right suffix)

Examples

# Basic usage
left <- data.frame(
  id = 1:5,
  treatment = 1,
  age = c(25, 30, 35, 40, 45),
  income = c(45000, 52000, 48000, 61000, 55000)
)

right <- data.frame(
  id = 6:10,
  treatment = 0,
  age = c(24, 29, 36, 41, 44),
  income = c(46000, 51500, 47500, 60000, 54000)
)

result <- match_couples(left, right, vars = c("age", "income"))
matched_data <- join_matched(result, left, right)
head(matched_data)

# Specify which variables to include
matched_data <- join_matched(
  result, left, right,
  left_vars = c("treatment", "age", "income"),
  right_vars = c("age", "income"),
  suffix = c("_treated", "_control")
)

# Without distance or pair_id
matched_data <- join_matched(
  result, left, right,
  include_distance = FALSE,
  include_pair_id = FALSE
)

couplr documentation built on March 3, 2026, 5:08 p.m.