Tests for Univariate and Multivariate Balance
Description
This function provides a variety of balance statistics useful for
determining if balance exists in any unmatched dataset and
in matched datasets produced by the Match
function. Matching is performed by the Match
function,
and MatchBalance
is used to determine if Match
was successful in achieving balance on the observed covariates.
Usage
1 2 
Arguments
formul 
This formula does not estimate any model. The formula is simply an efficient way to use the R modeling language to list the variables we wish to obtain univariate balance statistics for. The dependent variable in the formula is usually the treatment indicator. One should include many functions of the observed covariates. Generally, one should request balance statistics on more higherorder terms and interactions than were used to conduct the matching itself. 
data 
A data frame which contains all of the variables in the formula. If a data frame is not provided, the variables are obtained via lexical scoping. 
match.out 
The output object from the 
ks 
A logical flag for whether the univariate bootstrap
KolmogorovSmirnov (KS) test should be calculated. If the ks option
is set to true, the univariate KS test is calculated for all
nondichotomous variables. The bootstrap KS test is consistent even
for noncontinuous variables. See 
weights 
An optional vector of observation specific weights. 
nboots 
The number of bootstrap samples to be run. If zero, no
bootstraps are done. Bootstrapping is highly recommended because
the bootstrapped KolmogorovSmirnov test provides correct coverage
even when the distributions being compared are not continuous. At
least 500 
digits 
The number of significant digits that should be displayed. 
paired 
A flag for whether the paired 
print.level 
The amount of printing to be done. If zero, there is no printing. If one, the results are summarized. If two, details of the computations are printed. 
Details
This function can be used to determine if there is balance in the pre
and/or postmatching datasets. Difference of means between treatment
and control groups are provided as well as a variety of summary
statistics for the empirical CDF (eCDF) and empiricalQQ (eQQ) plot
between the two groups. The eCDF results are the standardized mean,
median and maximum differences in the empirical CDF. The eQQ results
are summaries of the raw differences in the empiricalQQ plot.
Two univariate tests are also provided: the ttest and the bootstrap
KolmogorovSmirnov (KS) test. These tests should not be treated as
hypothesis tests in the usual fashion because we wish to maximize
balance without limit. The bootstrap KS test is highly
recommended (see the ks
and nboots
options) because the
bootstrap KS is consistent even for noncontinuous distributions.
Before matching, the two sample ttest is used; after matching, the
paired ttest is used.
Two multivariate tests are provided. The KS and ChiSquare null deviance tests. The KS test is to be preferred over the ChiSquare test because the ChiSquare test is not testing the relevant hypothesis. The null hypothesis for the KS test is equal balance in the estimated probabilities between treated and control. The null hypothesis for the ChiSquare test, however, is all of the parameters being insignificant; a comparison of residual versus null deviance. If the covariates being considered are discrete, this KS test is asymptotically nonparametric as long as the logit model does not produce zero parameter estimates.
NA
's are handled by the na.action
option. But it
is highly recommended that NA
's not simply be deleted, but
one should check to make sure that missingness is balanced.
Value
BeforeMatching 
A list containing the before matching univariate
balance statistics. That is, a list containing the results of
the 
AfterMatching 
A list containing the after matching univariate
balance statistics. That is, a list containing the results of
the 
BMsmallest.p.value 
The smallest p.value found across all of the before matching balance tests (including ttests and KStests. 
BMsmallestVarName 
The name of the variable with the

BMsmallestVarNumber 
The number of the variable with the

AMsmallest.p.value 
The smallest p.value found across all of the after matching balance tests (including ttests and KStests. 
AMsmallestVarName 
The name of the variable with the

AMsmallestVarNumber 
The number of the variable with the

Author(s)
Jasjeet S. Sekhon, UC Berkeley, sekhon@berkeley.edu, http://sekhon.berkeley.edu/.
References
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 152. http://www.jstatsoft.org/v42/i07/
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. http://sekhon.berkeley.edu/papers/GenMatch.pdf
Abadie, Alberto. 2002. “Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models.” Journal of the American Statistical Association, 97:457 (March) 284292.
Hall, Peter. 1992. The Bootstrap and Edgeworth Expansion. New York: SpringerVerlag.
Wilcox, Rand R. 1997. Introduction to Robust Estimation. San Diego, CA: Academic Press.
William J. Conover (1971), Practical nonparametric statistics. New York: John Wiley & Sons. Pages 295301 (onesample "Kolmogorov" test), 309314 (twosample "Smirnov" test).
Shao, Jun and Dongsheng Tu. 1995. The Jackknife and Bootstrap. New York: SpringerVerlag.
See Also
Also see Match
, GenMatch
,
balanceUV
, qqstats
, ks.boot
,
GerberGreenImai
, lalonde
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40  #
# Replication of Dehejia and Wahba psid3 model
#
# Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in
# NonExperimental Studies: ReEvaluating the Evaluation of Training
# Programs.''Journal of the American Statistical Association 94 (448):
# 10531062.
data(lalonde)
#
# Estimate the propensity model
#
glm1 < glm(treat~age + I(age^2) + educ + I(educ^2) + black +
hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) +
u74 + u75, family=binomial, data=lalonde)
#
#save data objects
#
X < glm1$fitted
Y < lalonde$re78
Tr < lalonde$treat
#
# onetoone matching with replacement (the "M=1" option).
# Estimating the treatment effect on the treated (the "estimand" option which defaults to 0).
#
rr < Match(Y=Y,Tr=Tr,X=X,M=1);
#Let's summarize the output
summary(rr)
# Let's check the covariate balance
# 'nboots' is set to small values in the interest of speed.
# Please increase to at least 500 each for publication quality pvalues.
mb < MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black +
hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) +
u74 + u75, data=lalonde, match.out=rr, nboots=10)
