bmatch  R Documentation 
Function for optimal bipartite matching in observational studies that directly balances the observed covariates. bmatch
allows the user to enforce different forms of covariate balance in the matched samples such as moment balance (e.g., of means, variances and correlations), distributional balance (e.g., fine balance, nearfine balance, strengthk balancing) and exact matching. In observational studies where an instrumental variable is available, bmatch
can be used to handle weak instruments and strengthen them by means of the far
options (see Yang et al. 2014 for an example). bmatch
can also be used in discontinuity designs by matching units in a neighborhood of the discontinuity (see Keele et al. 2015 for details). In any of these settings, bmatch
either minimizes the total sum of covariate distances between matched units, maximizes the total number of matched units, or optimizes a combination of the two, subject to matching, covariate balancing, and representativeness constraints (see the examples below).
bmatch(t_ind, dist_mat = NULL, subset_weight = NULL, n_controls = 1,
total_groups = NULL, mom = NULL, ks = NULL, exact = NULL,
near_exact = NULL, fine = NULL, near_fine = NULL, near = NULL,
far = NULL, solver = NULL)
t_ind 
treatment indicator: a vector of zeros and ones indicating treatment (1 = treated; 0 = control). Please note that the data needs to be sorted in decreasing order according to this treatment indicator. 
dist_mat 
distance matrix: a matrix of positive distances between treated units (rows) and controls (columns). If 
subset_weight 
subset matching weight: a scalar that regulates the tradeoff between the total sum of distances between matched pairs and the total number of matched pairs. The larger If If 
n_controls 
number of controls: a scalar defining the number of controls to be matched with a fixed rate to each treated unit. The default is 
total_groups 
total number of matched pairs: a scalar specifying the number of matched pairs to be obtained. If 
mom 
moment balance parameters: a list with three arguments,

ks 
KolmogorovSmirnov balance parameters: a list with three objects,

exact 
Exact matching parameters: a list with one argument,
where 
near_exact 
Nearexact matching parameters: a list with two objects,

fine 
Fine balance parameters: a list with one argument,
where 
near_fine 
Nearfine balance parameters: a list with two objects,

near 
Near matching parameters: a list with three objects,

far 
Far matching parameters: a list with three objects,

solver 
Optimization solver parameters: a list with four objects,

A list containing the optimal solution, with the following objects:
obj_total 
value of the objective function at the optimum; 
obj_dist_mat 
value of the total sum of distances term of the objective function at the optimum; 
t_id 
indexes of the matched treated units at the optimum; 
c_id 
indexes of the matched controls at the optimum; 
group_id 
matched pairs or groups at the optimum; 
time 
time elapsed to find the optimal solution. 
Jose R. Zubizarreta <zubizarreta@hcp.med.harvard.edu>, Cinar Kilcioglu <ckilcioglu16@gsb.columbia.edu>.
Keele, L., Titiunik, R., and Zubizarreta, J. R., (2015), "Enhancing a Geographic Regression Discontinuity Design Through Matching to Estimate the Effect of Ballot Initiatives on Voter Turnout," Journal of the Royal Statistical Society: Series A, 178, 223239.
Rosenbaum, P. R. (2010), Design of Observational Studies, Springer.
Rosenbaum, P. R. (2012), "Optimal Matching of an Optimally Chosen Subset in Observa tional studies," Journal of Computational and Graphical Statistics, 21, 5771.
Yang, D., Small, D., Silber, J. H., and Rosenbaum, P. R. (2012), "Optimal Matching With Minimal Deviation From Fine Balance in a Study of Obesity and Surgical Outcomes," Biometrics, 68, 62836.
Yang. F., Zubizarreta, J. R., Small, D. S., Lorch, S. A., and Rosenbaum, P. R. (2014), "Dissonant Conclusions When Testing the Validity of an Instrumental Variable," The American Statistician, 68, 253263.
Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H., and Rosenbaum, P. R. (2011), "Matching for Several Sparse Nominal Variables in a CaseControl Study of Readmission Following Surgery," The American Statistician, 65, 229238.
Zubizarreta, J. R. (2012), "Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery," Journal of the American Statistical Association, 107, 13601371.
Zubizarreta, J. R., Paredes, R. D., and Rosenbaum, P. R. (2014), "Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of Forprofit and Notforprofit High Schools in Chile," Annals of Applied Statistics, 8, 204231.
sensitivitymv, sensitivitymw.
## Uncomment the following examples
## Load, sort, and attach data
#data(lalonde)
#lalonde = lalonde[order(lalonde$treatment, decreasing = TRUE), ]
#attach(lalonde)
#################################
## Example 1: cardinality matching
#################################
## Cardinality matching finds the largest matched sample of pairs that meets balance
## requirements. Here the balance requirements are mean balance, fine balance and
## exact matching for different covariates. The solver used is glpk with the
## approximate option.
## Treatment indicator; note that the data needs to be sorted in decreasing order
## according to this treatment indicator
#t_ind = treatment
#t_ind
## Distance matrix
#dist_mat = NULL
## Subset matching weight
#subset_weight = 1
## Moment balance: constrain differences in means to be at most .05 standard deviations apart
#mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
#mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
#mom = list(covs = mom_covs, tols = mom_tols)
## Fine balance
#fine_covs = cbind(black, hispanic, married, nodegree)
#fine = list(covs = fine_covs)
## Exact matching
#exact_covs = cbind(black)
#exact = list(covs = exact_covs)
## Solver options
#t_max = 60*5
#solver = "glpk"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate,
#round_cplex = 0, trace = 0)
## Match
#out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight,
#mom = mom, fine = fine, exact = exact, solver = solver)
## Indices of the treated units and matched controls
#t_id = out$t_id
#c_id = out$c_id
## Time
#out$time/60
## Matched group identifier (who is matched to whom)
#out$group_id
## Assess mean balance
#meantab(mom_covs, t_ind, t_id, c_id)
## Assess fine balance (note here we are getting an approximate solution)
#for (i in 1:ncol(fine_covs)) {
# print(finetab(fine_covs[, i], t_id, c_id))
#}
## Assess exact matching balance
#table(exact_covs[t_id]==exact_covs[c_id])
##################################
## Example 2: minimum distance matching
##################################
## The goal here is to minimize the total of distances between matched pairs. In
## this example there are no covariate balance requirements. Again, the solver
## used is glpk with the approximate option
## Treatment indicator
#t_ind = treatment
## Matrix of covariates
#X_mat = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
## Distance matrix
#dist_mat = distmat(t_ind, X_mat)
## Subset matching weight
#subset_weight = NULL
## Total pairs to be matched
#total_groups = sum(t_ind)
## Solver options
#t_max = 60*5
#solver = "glpk"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate,
#round_cplex = 0, trace_cplex = 0)
## Match
#out = bmatch(t_ind = t_ind, dist_mat = dist_mat, total_groups = total_groups,
#solver = solver)
## Indices of the treated units and matched controls
#t_id = out$t_id
#c_id = out$c_id
## Total of distances between matched pairs
#out$obj_total
## Assess mean balance
#meantab(X_mat, t_ind, t_id, c_id)
##################################
## Example 3: optimal subset matching
##################################
## Optimal subset matching pursues two competing goals at
## the same time: to minimize the total sum of covariate distances
## while matching as many observations as possible. The tradeoff
## between these two goals is regulated by the parameter subset_weight
## (see Rosenbaum 2012 and Zubizarreta et al. 2013 for a discussion).
## Here the balance requirements are mean balance, nearfine balance
## and nearexact matching for different covariates.
## Again, the solver used is glpk with the approximate option.
## Treatment indicator
#t_ind = treatment
## Matrix of covariates
#X_mat = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
## Distance matrix
#dist_mat = distmat(t_ind, X_mat)
## Subset matching weight
#subset_weight = median(dist_mat)
## Moment balance: constrain differences in means to be at most .05 standard deviations apart
#mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
#mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
#mom = list(covs = mom_covs, tols = mom_tols)
## Nearfine balance
#near_fine_covs = cbind(married, nodegree)
#near_fine_devs = rep(5, 2)
#near_fine = list(covs = near_fine_covs, devs = near_fine_devs)
## Nearexact matching
#near_exact_covs = cbind(black, hispanic)
#near_exact_devs = rep(5, 2)
#near_exact = list(covs = near_exact_covs, devs = near_exact_devs)
## Solver options
#t_max = 60*5
#solver = "glpk"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate,
#round_cplex = 0, trace_cplex = 0)
## Match
#out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight,
#mom = mom, near_fine = near_fine, near_exact = near_exact, solver = solver)
## Indices of the treated units and matched controls
#t_id = out$t_id
#c_id = out$c_id
## Time
#out$time/60
## Matched group identifier (who is matched to whom)
#out$group_id
## Assess mean balance (note here we are getting an approximate solution)
#meantab(X_mat, t_ind, t_id, c_id)
## Assess fine balance
#for (i in 1:ncol(near_fine_covs)) {
# print(finetab(near_fine_covs[, i], t_id, c_id))
#}
## Assess exact matching balance
#for (i in 1:ncol(near_exact_covs)) {
# print(table(near_exact_covs[t_id, i]==near_exact_covs[c_id, i]))
#}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.