Multivariate and Propensity Score Matching Estimator for Causal Inference
Description
Match
implements a variety of algorithms for multivariate
matching including propensity score, Mahalanobis and inverse variance
matching. The function is intended to be used in conjunction with the
MatchBalance
function which determines the extent to which
Match
has been able to achieve covariate balance. In order to
do propensity score matching, one should estimate the propensity model
before calling Match
, and then send Match
the propensity
score to use. Match
enables a wide variety of matching
options including matching with or without replacement, bias
adjustment, different methods for handling ties, exact and caliper
matching, and a method for the user to fine tune the matches via a
general restriction matrix. Variance estimators include the usual
Neyman standard errors, AbadieImbens standard errors, and robust
variances which do not assume a homogeneous causal effect. The
GenMatch
function can be used to automatically
find balance via a genetic search algorithm which determines the
optimal weight to give each covariate.
Usage
1 2 3 4 5 6  Match(Y=NULL, Tr, X, Z = X, V = rep(1, length(Y)), estimand = "ATT", M = 1,
BiasAdjust = FALSE, exact = NULL, caliper = NULL, replace=TRUE, ties=TRUE,
CommonSupport=FALSE,Weight = 1, Weight.matrix = NULL, weights = NULL,
Var.calc = 0, sample = FALSE, restrict=NULL, match.out = NULL,
distance.tolerance = 1e05, tolerance=sqrt(.Machine$double.eps),
version="standard")

Arguments
Y 
A vector containing the outcome of interest. Missing values are not allowed. An outcome vector is not required because the matches generated will be the same regardless of the outcomes. Of course, without any outcomes no causal effect estimates will be produced, only a matched dataset. 
Tr 
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. 
X 
A matrix containing the variables we wish to match on.
This matrix may contain the actual observed covariates or the
propensity score or a combination of both. All columns of this
matrix must have positive variance or 
Z 
A matrix containing the covariates for which we wish to make bias adjustments. 
V 
A matrix containing the covariates for which the variance
of the causal effect may vary. Also see the 
estimand 
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect, and "ATC" is the sample average treatment effect for the controls. 
M 
A scalar for the number of matches which should be
found. The default is onetoone matching. Also see the 
BiasAdjust 
A logical scalar for whether regression adjustment
should be used. See the 
exact 
A logical scalar or vector for whether exact matching
should be done. If a logical scalar is provided, that logical value is
applied to all covariates in

caliper 
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in 
replace 
A logical flag for whether matching should be done with
replacement. Note that if 
ties 
A logical flag for whether ties should be handled deterministically. By
default 
CommonSupport 
This logical flag implements the usual procedure
by which observations outside of the common support of a variable
(usually the propensity score) across treatment and control groups are
discarded. The 
Weight 
A scalar for the type of weighting scheme the matching
algorithm should use when weighting each of the covariates in

Weight.matrix 
This matrix denotes the weights the matching
algorithm uses when weighting each of the covariates in For most uses, this matrix has zeros in the offdiagonal
cells. This matrix can be used to weight some variables more than
others. For
example, if 
weights 
A vector the same length as 
Var.calc 
A scalar for the variance estimate
that should be used. By default 
sample 
A logical flag for whether the population or sample variance is returned. 
distance.tolerance 
This is a scalar which is used to determine
if distances between two observations are different from zero. Values
less than 
tolerance 
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if a matrix is singular. 
restrict 
A matrix which restricts the possible matches. This
matrix has one row for each restriction and three
columns. The first two columns contain the two observation numbers
which are to be restricted (for example 4 and 20), and the third
column is the restriction imposed on the observationpair.
Negative numbers in the third column imply that the two observations
cannot be matched under any circumstances, and positive numbers are
passed on as the distance between the two observations for the
matching algorithm. The most commonly used positive restriction is
Exclusion restrictions are even more common. For example, if we want
to exclude the observation pair 4 and 20 and
the pair 6 and 55 from being matched, the restrict matrix would be:

match.out 
The return object from a previous call to

version 
The version of the code to be used. The "fast" C/C++
version of the code does not calculate AbadieImbens standard errors.
Additional speed can be obtained by setting 
Details
This function is intended to be used in conjunction with the
MatchBalance
function which checks if the results of this
function have actually achieved balance. The results of this function
can be summarized by a call to the summary.Match
function. If one wants to do propensity score matching, one should estimate the
propensity model before calling Match
, and then place the
fitted values in the X
matrix—see the provided example.
The GenMatch
function can be used to automatically
find balance by the use of a genetic search algorithm which determines
the optimal weight to give each covariate. The object returned by
GenMatch
can be supplied to the Weight.matrix
option of Match
to obtain estimates.
Match
is often much faster with large datasets if
ties=FALSE
or replace=FALSE
—i.e., if matching is done
by randomly breaking ties or without replacement. Also see the
Matchby
function. It provides a wrapper for
Match
which is much faster for large datasets when it can be
used.
Three demos are included: GerberGreenImai
, DehejiaWahba
,
and AbadieImbens
. These can be run by calling the
demo
function such as by demo(DehejiaWahba)
.
Value
est 
The estimated average causal effect. 
se 
The AbadieImbens standard error. This standard error has
correct coverage if 
est.noadj 
The estimated average causal effect without any

se.standard 
The usual standard error. This is the standard error
calculated on the matched data using the usual method of calculating
the difference of means (between treated and control) weighted by the
observation weights provided by 
se.cond 
The conditional standard error. The practitioner should not generally use this. 
mdata 
A list which contains the matched datasets produced by

index.treated 
A vector containing the observation numbers from
the original dataset for the treated observations in the
matched dataset. This index in conjunction with 
index.control 
A vector containing the observation numbers from
the original data for the control observations in the
matched data. This index in conjunction with 
index.dropped 
A vector containing the observation numbers from
the original data which were dropped (if any) in the matched dataset
because of various options such as 
weights 
A vector of weights. There is one weight for each matchedpair in the matched dataset. If all of the observations had a weight of 1 on input, then each matchedpair will have a weight of 1 on output if there are no ties. 
orig.nobs 
The original number of observations in the dataset. 
orig.wnobs 
The original number of weighted observations in the dataset. 
orig.treated.nobs 
The original number of treated observations (unweighted). 
nobs 
The number of observations in the matched dataset. 
wnobs 
The number of weighted observations in the matched dataset. 
caliper 
The 
ecaliper 
The size of the enforced caliper on the scale of the

exact 
The value of the 
ndrops 
The number of weighted observations which were dropped
either because of caliper or exact matching. This number, unlike

ndrops.matches 
The number of matches which were dropped either because of caliper or exact matching. 
Author(s)
Jasjeet S. Sekhon, UC Berkeley, sekhon@berkeley.edu, http://sekhon.berkeley.edu/.
References
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 152. http://www.jstatsoft.org/v42/i07/
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. http://sekhon.berkeley.edu/papers/GenMatch.pdf
Abadie, Alberto and Guido Imbens. 2006. “Large Sample Properties of Matching Estimators for Average Treatment Effects.” Econometrica 74(1): 235267.
Imbens, Guido. 2004. Matching Software for Matlab and Stata.
See Also
Also see summary.Match
,
GenMatch
,
MatchBalance
,
Matchby
,
balanceUV
,
qqstats
, ks.boot
,
GerberGreenImai
, lalonde
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37  # Replication of Dehejia and Wahba psid3 model
#
# Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in
# NonExperimental Studies: ReEvaluating the Evaluation of Training
# Programs.''Journal of the American Statistical Association 94 (448):
# 10531062.
data(lalonde)
#
# Estimate the propensity model
#
glm1 < glm(treat~age + I(age^2) + educ + I(educ^2) + black +
hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) +
u74 + u75, family=binomial, data=lalonde)
#
#save data objects
#
X < glm1$fitted
Y < lalonde$re78
Tr < lalonde$treat
#
# onetoone matching with replacement (the "M=1" option).
# Estimating the treatment effect on the treated (the "estimand" option defaults to ATT).
#
rr < Match(Y=Y, Tr=Tr, X=X, M=1);
summary(rr)
# Let's check the covariate balance
# 'nboots' is set to small values in the interest of speed.
# Please increase to at least 500 each for publication quality pvalues.
mb < MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black +
hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) +
u74 + u75, data=lalonde, match.out=rr, nboots=10)
