matchindex: Find index of matched donor units
In mice: Multivariate Imputation by Chained Equations

matchindex

R Documentation

Find index of matched donor units

Description

Find index of matched donor units

Usage

matchindex(d, t, k = 5L)

Arguments

`d`	Numeric vector with values from donor cases.
`t`	Numeric vector with values from target cases.
`k`	Integer, number of unique donors from which a random draw is made. For `k = 1` the function returns the index in `d` corresponding to the closest unit. For multiple imputation, the advice is to set values in the range of `k = 5` to `k = 10`.

Details

For each element in t, the method finds the k nearest neighbours in d, randomly draws one of these neighbours, and returns its position in vector d.

Fast predictive mean matching algorithm in seven steps:

Shuffle records to remove effects of ties
Obtain sorting order on shuffled data
Calculate index on input data and sort it
Pre-sample vector h with values between 1 and k

For each of the n0 elements in t:

find the two adjacent neighbours
find the h_i'th nearest neighbour
store the index of that neighbour

Return vector of n0 positions in d.

We may use the function to perform predictive mean matching under a given predictive model. To do so, specify both d and t as predictions from the same model. Suppose that y contains the observed outcomes of the donor cases (in the same sequence as d), then y[matchindex(d, t)] returns one matched outcome for every target case.

See https://github.com/amices/mice/issues/236. This function is a replacement for the matcher() function that has been in default in mice since version 2.22 (June 2014).

Value

An integer vector with length(t) elements. Each element is an index in the array d.

Author(s)

Stef van Buuren, Nasinski Maciej, Alexander Robitzsch

Examples

set.seed(1)

# Inputs need not be sorted
d <- c(-5, 5, 0, 10, 12)
t <- c(-6, -4, 0, 2, 4, -2, 6)

# Index (in vector a) of closest match
idx <- matchindex(d, t, 1)
idx

# To check: show values of closest match

# Random draw among indices of the 5 closest predictors
matchindex(d, t)

# An example
train <- mtcars[1:20, ]
test <- mtcars[21:32, ]
fit <- lm(mpg ~ disp + cyl, data = train)
d <- fitted.values(fit)
t <- predict(fit, newdata = test)  # note: not using mpg
idx <- matchindex(d, t)

# Borrow values from train to produce 12 synthetic values for mpg in test.
# Synthetic values are plausible values that could have been observed if
# they had been measured.
train$mpg[idx]

# Exercise: Create a distribution of 1000 plausible values for each of the
# twelve mpg entries in test, and count how many times the true value
# (which we know here) is located within the inter-quartile range of each
# distribution. Is your count anywhere close to 500? Why? Why not?

mice documentation built on June 8, 2025, 11:31 a.m.