Description Usage Arguments Details Value References See Also Examples
A function with which to produce matching distances, for instance Mahalanobis
distances, propensity score discrepancies or calipers, or combinations
thereof, for pairmatch
or fullmatch
to
subsequently “match on”. Conceptually, the result of a call
match_on
is a treatmentbycontrol matrix of distances. Because these
matrices can grow quite large, in practice match_on
produces either an
ordinary dense matrix or a special sparse matrix structure (that can make use
of caliper and exact matching constraints to reduce storage requirements).
Methods are supplied for these sparse structures,
InfinitySparseMatrix
es, so that they can be manipulated and modified
in much the same way as dense matrices.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  match_on(x, within = NULL, caliper = NULL, data = NULL, ...)
## S3 method for class 'glm'
match_on(x, within = NULL, caliper = NULL, data = NULL,
standardization.scale = mad, ...)
## S3 method for class 'bigglm'
match_on(x, within = NULL, caliper = NULL,
data = NULL, standardization.scale = mad, ...)
## S3 method for class 'formula'
match_on(x, within = NULL, caliper = NULL,
data = NULL, subset = NULL, method = "mahalanobis", ...)
## S3 method for class 'function'
match_on(x, within = NULL, caliper = NULL,
data = NULL, z = NULL, ...)
## S3 method for class 'numeric'
match_on(x, within = NULL, caliper = NULL,
data = NULL, z, ...)
## S3 method for class 'InfinitySparseMatrix'
match_on(x, within = NULL,
caliper = NULL, data = NULL, ...)
## S3 method for class 'matrix'
match_on(x, within = NULL, caliper = NULL,
data = NULL, ...)

x 
An object defining how to create the distances. All methods require
some form of names (e.g. 
within 
A valid distance specification, such as the result of

caliper 
The width of a caliper to use to exclude treatedcontrol pairs
with values greater than the width. For some methods, there may be a speed
advantage to passing a width rather than using the 
data 
An optional data frame. 
... 
Other arguments for methods. 
standardization.scale 
Function for rescaling of 
subset 
A subset of the data to use in creating the distance specification. 
method 
A string indicating which method to use in computing the
distances from the data. The current possibilities are

z 
A logical or binary vector indicating treatment and control for each unit in the study. TRUE or 1 represents a treatment unit, FALSE of 0 represents a control unit. Any unit with NA treatment status will be excluded from the distance matrix. 
match_on
is generic. There are several supplied methods, all providing
the same basic output: a matrix (or similar) object with treated units on the
rows and control units on the columns. Each cell [i,j] then indicates the
distance from a treated unit i to control unit j. Entries that are Inf
are said to be unmatchable. Such units are guaranteed to never be in a
matched set. For problems with many Inf
entries, so called sparse
matching problems, match_on
uses a special data type that is more
space efficient than a standard R matrix
. When problems are not
sparse (i.e. dense), match_on
uses the standard matrix
type.
match_on
methods differ on the types of arguments they take, making
the function a onestop location of many different ways of specifying
matches: using functions, formulas, models, and even simple scores. Many of
the methods require additional arguments, detailed below. All methods take a
within
argument, a distance specification made using
exactMatch
or caliper
(or some additive
combination of these or other distance creating functions). All
match_on
methods will use the finite entries in the within
argument as a guide for producing the new distance. Any entry that is
Inf
in within
will be Inf
in the distance matrix
returned by match_on
. This argument can reduce the processing time
needed to compute sparse distance matrices.
The match_on
function is similar to the older, but still supplied,
mdist
function. Future development will concentrate on
match_on
, but mdist
is still supplied for users familiar with
the interface. For the most part, the two functions can be used
interchangeably by users.
The glm
method assumes its first argument to be a fitted
propensity model. From this it extracts distances on the linear
propensity score: fitted values of the linear predictor, the link function
applied to the estimated conditional probabilities, as opposed to the
estimated conditional probabilities themselves (Rosenbaum \& Rubin, 1985).
For example, a logistic model (glm
with family=binomial()
)
has the logit function as its link, so from such models match_on
computes distances in terms of logits of the estimated conditional
probabilities, i.e. the estimated log odds.
Optionally these distances are also rescaled. The default is to rescale, by
the reciprocal of an outlierresistant variant of the pooled s.d. of
propensity scores. (Outlier resistance is obtained by the application of
mad
, as opposed to sd
, to linear propensity scores in the
treatment; this can be changed to the actual pooled s.d., or rescaling can
be skipped entirely, by setting argument standardization.scale
to
sd
or NULL
, respectively.) The overall result records
absolute differences between treated and control units on linear, possibly
rescaled, propensity scores.
In addition, one can impose a caliper in terms of these distances by
providing a scalar as a caliper
argument, forbidding matches between
treatment and control units differing in the calculated propensity score by
more than the specified caliper. For example, Rosenbaum and Rubin's (1985)
caliper of onefifth of a pooled propensity score s.d. would be imposed by
specifying caliper=.2
, in tandem either with the default rescaling
or, to follow their example even more closely, with the additional
specification standardization.scale=sd
. Propensity calipers are
beneficial computationally as well as statistically, for reasons indicated
in the below discussion of the numeric
method.
One can also specify exactMatching criteria by using strata(foo)
inside
the formula to build the glm
. For example, passing
glm(y ~ x + strata(s))
to match_on
is equivalent to passing
within=exactMatch(y ~ strata(s))
. Note that when combining with
the caliper
argument, the standard deviation used for the caliper will be
computed across all strata, not within each strata.
The bigglm
method works analogously to the glm
method, but with bigglm
objects, created by
the bigglm
function from package ‘biglm’, which can
handle bigger data sets than the ordinary glm function can.
The formula method produces, by default, a Mahalanobis distance
specification based on the formula Z ~ X1 + X2 + ...
, where
Z
is the treatment indicator. The Mahalanobis distance is calculated
as the square root of d'Cd, where d is the vector of Xdifferences on a
pair of observations and C is an inverse (generalized inverse) of the
pooled covariance of Xes. (The pooling is of the covariance of X within the
subset defined by Z==0
and within the complement of that
subset. This is similar to a Euclidean distance calculated after
reexpressing the Xes in standard units, such that the reexpressed variables
all have pooled SDs of 1; except that it addresses redundancies among the
variables by scaling down variables contributions in proportion to their
correlations with other included variables.)
Euclidean distance is also available, via method="euclidean"
, and
ranked, Mahalanobis distance, via method="rank_mahalanobis"
.
The treatment indicator Z
as noted above must either be numeric
(1 representing treated units and 0 control units) or logical
(TRUE
for treated, FALSE
for controls). (Earlier versions of
the software accepted factor variables and other types of numeric variable; you
may have to update existing scripts to get them to run.) A unit with NA
treatment status is ignored and will not be included in the distance output.
As an alternative to specifying a within
argument, when x
is
a formula, the strata
command can be used inside the formula to specify
exact matching. For example, rather than using within=exactMatch(y ~
z, data=data)
, you may update your formula as y ~ x + strata(z)
. Do
not use both methods (within
and strata
simultaneously. Note
that when combining with the caliper
argument, the standard
deviation used for the caliper will be computed across all strata, not
within each strata.
The function
method takes as its x
argument a function
of three arguments: index
, data
, and z
. The
data
and z
arguments will be the same as those passed
directly to match_on
. The index
argument is a matrix of two
columns, representing the pairs of treated and control units that are valid
comparisons (given any within
arguments). The first column is the
row name or id of the treated unit in the data
object. The second
column is the id for the control unit, again in the data
object. For
each of these pairs, the function should return the distance between the
treated unit and control unit. This may sound complicated, but is simple
to use. For example, a function that returned the absolute difference
between two units using a vector of data would be f <
function(index, data, z) { abs(apply(index, 1, function(pair) {
data[pair[1]]  data[pair[2]] })) }
. (Note: This simple case is precisely
handled by the numeric
method.)
The numeric
method returns absolute differences between treated and control units'
values of x
. If a caliper is specified, pairings with x
differences greater than it
are forbidden. Conceptually, those distances are set to Inf
; computationally, if either of
caliper
and within
has been specified then only information about permissible pairings
will be stored, so the forbidden pairings are simply omitted. Providing a caliper
argument here,
as opposed to omitting it and afterward applying the caliper
function, reduces
storage requirements and may otherwise improve performance, particularly in larger problems.
For the numeric method, x
must have names.
The matrix
and InfinitySparseMatrix
just return their
arguments as these objects are already valid distance specifications.
A distance specification (a matrix or similar object) which is
suitable to be given as the distance
argument to
fullmatch
or pairmatch
.
P.~R. Rosenbaum and D.~B. Rubin (1985), ‘Constructing a control group using multivariate matched sampling methods that incorporate the propensity score’, The American Statistician, 39 33–38.
fullmatch
, pairmatch
,
exactMatch
, caliper
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67  data(nuclearplants)
match_on.examples < list()
### Propensity score distances.
### Recommended approach:
(aGlm < glm(pr~.(pr+cost), family=binomial(), data=nuclearplants))
match_on.examples$ps1 < match_on(aGlm)
### A second approach: first extract propensity scores, then separately
### create a distance from them. (Useful when importing propensity
### scores from an external program.)
plantsPS < predict(aGlm)
match_on.examples$ps2 < match_on(pr~plantsPS, data=nuclearplants)
### Full matching on the propensity score.
fm1 < fullmatch(match_on.examples$ps1, data = nuclearplants)
fm2 < fullmatch(match_on.examples$ps2, data = nuclearplants)
### Because match_on.glm uses robust estimates of spread,
### the results differ in detail  but they are close enough
### to yield similar optimal matches.
all(fm1 == fm2) # The same
### Mahalanobis distance:
match_on.examples$mh1 < match_on(pr ~ t1 + t2, data = nuclearplants)
### Absolute differences on a scalar:
tmp < nuclearplants$t1
names(tmp) < rownames(nuclearplants)
(absdist < match_on(tmp, z = nuclearplants$pr,
within = exactMatch(pr ~ pt, nuclearplants)))
### Pair matching on the variable `t1`:
pairmatch(absdist, data = nuclearplants)
### Propensity score matching within subgroups:
match_on.examples$ps3 < match_on(aGlm, exactMatch(pr ~ pt, nuclearplants))
fullmatch(match_on.examples$ps3, data = nuclearplants)
### Propensity score matching with a propensity score caliper:
match_on.examples$pscal < match_on.examples$ps1 + caliper(match_on.examples$ps1, 1)
fullmatch(match_on.examples$pscal, data = nuclearplants) # Note that the caliper excludes some units
### A Mahalanobis distance for matching within subgroups:
match_on.examples$mh2 < match_on(pr ~ t1 + t2 , data = nuclearplants,
within = exactMatch(pr ~ pt, nuclearplants))
### Mahalanobis matching within subgroups, with a propensity score
### caliper:
fullmatch(match_on.examples$mh2 + caliper(match_on.examples$ps3, 1), data = nuclearplants)
### Alternative methods to matching without groups (exact matching)
m1 < match_on(pr ~ t1 + t2, data=nuclearplants, within=exactMatch(pr ~ pt, nuclearplants))
m2 < match_on(pr ~ t1 + t2 + strata(pt), data=nuclearplants)
# m1 and m2 are identical
m3 < match_on(glm(pr ~ t1 + t2 + cost, data=nuclearplants,
family=binomial),
data=nuclearplants,
within=exactMatch(pr ~ pt, data=nuclearplants))
m4 < match_on(glm(pr ~ t1 + t2 + cost + pt, data=nuclearplants,
family=binomial),
data=nuclearplants,
within=exactMatch(pr ~ pt, data=nuclearplants))
m5 < match_on(glm(pr ~ t1 + t2 + cost + strata(pt), data=nuclearplants,
family=binomial), data=nuclearplants)
# Including `strata(foo)` inside a glm uses `foo` in the model as
# well, so here m4 and m5 are equivalent. m3 differs in that it does
# not include `pt` in the glm.

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.