Description Usage Arguments Value Author(s) References See Also Examples
Function for optimal bipartite matching in observational studies that directly balances the observed covariates. bmatch
allows the user to enforce different forms of covariate balance in the matched samples such as moment balance (e.g., of means, variances and correlations), distributional balance (e.g., fine balance, nearfine balance, strengthk balancing) and exact matching. In observational studies where an instrumental variable is available, bmatch
can be used to handle weak instruments and strengthen them by means of the far
options (see Yang et al. 2014 for an example). bmatch
can also be used in discontinuity designs by matching units in a neighborhood of the discontinuity (see Keele et al. 2015 for details). In any of these settings, bmatch
either minimizes the total sum of covariate distances between matched units, maximizes the total number of matched units, or optimizes a combination of the two, subject to matching, covariate balancing, and representativeness constraints (see the examples below).
1 2 3 4 
t_ind 
treatment indicator: a vector of zeros and ones indicating treatment (1 = treated; 0 = control). Please note that the data needs to be sorted in decreasing order according to this treatment indicator. 
dist_mat 
distance matrix: a matrix of positive distances between treated units (rows) and controls (columns). If 
subset_weight 
subset matching weight: a scalar that regulates the tradeoff between the total sum of distances between matched pairs and the total number of matched pairs. The larger If If 
n_controls 
number of controls: a scalar defining the number of controls to be matched with a fixed rate to each treated unit. The default is 
total_groups 
total number of matched pairs: a scalar specifying the number of matched pairs to be obtained. If 
mom 
moment balance parameters: a list with three arguments,

ks 
KolmogorovSmirnov balance parameters: a list with three objects,

exact 
Exact matching parameters: a list with one argument,
where 
near_exact 
Nearexact matching parameters: a list with two objects,

fine 
Fine balance parameters: a list with one argument,
where 
near_fine 
Nearfine balance parameters: a list with two objects,

near 
Near matching parameters: a list with three objects,

far 
Far matching parameters: a list with three objects,

solver 
Optimization solver parameters: a list with four objects,

A list containing the optimal solution, with the following objects:
obj_total 
value of the objective function at the optimum; 
obj_dist_mat 
value of the total sum of distances term of the objective function at the optimum; 
t_id 
indexes of the matched controls at the optimum; 
c_id 
indexes of the matched controls at the optimum; 
group_id 
matched pairs or groups at the optimum; 
time 
time elapsed to find the optimal solution. 
Jose R. Zubizarreta <zubizarreta@hcp.med.harvard.edu>, Cinar Kilcioglu <ckilcioglu16@gsb.columbia.edu>.
Keele, L., Titiunik, R., and Zubizarreta, J. R., (2015), "Enhancing a Geographic Regression Discontinuity Design Through Matching to Estimate the Effect of Ballot Initiatives on Voter Turnout," Journal of the Royal Statistical Society: Series A, 178, 223239.
Rosenbaum, P. R. (2010), Design of Observational Studies, Springer.
Rosenbaum, P. R. (2012), "Optimal Matching of an Optimally Chosen Subset in Observa tional studies," Journal of Computational and Graphical Statistics, 21, 5771.
Yang, D., Small, D., Silber, J. H., and Rosenbaum, P. R. (2012), "Optimal Matching With Minimal Deviation From Fine Balance in a Study of Obesity and Surgical Outcomes," Biometrics, 68, 62836.
Yang. F., Zubizarreta, J. R., Small, D. S., Lorch, S. A., and Rosenbaum, P. R. (2014), "Dissonant Conclusions When Testing the Validity of an Instrumental Variable," The American Statistician, 68, 253263.
Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H., and Rosenbaum, P. R. (2011), "Matching for Several Sparse Nominal Variables in a CaseControl Study of Readmission Following Surgery," The American Statistician, 65, 229238.
Zubizarreta, J. R. (2012), "Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery," Journal of the American Statistical Association, 107, 13601371.
Zubizarreta, J. R., Paredes, R. D., and Rosenbaum, P. R. (2014), "Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of Forprofit and Notforprofit High Schools in Chile," Annals of Applied Statistics, 8, 204231.
sensitivitymv, sensitivitymw.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 
# Load, sort, and attach data
data(lalonde)
lalonde = lalonde[order(lalonde$treatment, decreasing = TRUE), ]
attach(lalonde)
#################################
# Example 1: cardinality matching
#################################
# Cardinality matching finds the largest matched sample of pairs that meets balance
# requirements. Here the balance requirements are mean balance, fine balance and
# exact matching for different covariates. The solver used is glpk with the
# approximate option.
# Treatment indicator; note that the data needs to be sorted in decreasing order
# according to this treatment indicator
t_ind = treatment
t_ind
# Distance matrix
dist_mat = NULL
# Subset matching weight
subset_weight = 1
# Moment balance: constrain differences in means to be at most .05 standard deviations apart
mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
mom = list(covs = mom_covs, tols = mom_tols)
# Fine balance
fine_covs = cbind(black, hispanic, married, nodegree)
fine = list(covs = fine_covs)
# Exact matching
exact_covs = cbind(black)
exact = list(covs = exact_covs)
# Solver options
t_max = 60*5
solver = "glpk"
approximate = 1
solver = list(name = solver, t_max = t_max, approximate = approximate,
round_cplex = 0, trace = 0)
# Match
out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight,
mom = mom, fine = fine, exact = exact, solver = solver)
# Indices of the treated units and matched controls
t_id = out$t_id
c_id = out$c_id
# Time
out$time/60
# Matched group identifier (who is matched to whom)
out$group_id
# Assess mean balance
meantab(mom_covs, t_ind, t_id, c_id)
# Assess fine balance (note here we are getting an approximate solution)
for (i in 1:ncol(fine_covs)) {
print(finetab(fine_covs[, i], t_id, c_id))
}
# Assess exact matching balance
table(exact_covs[t_id]==exact_covs[c_id])
## Uncomment the following examples
##################################
## Example 2: minimum distance matching
##################################
## The goal here is to minimize the total of distances between matched pairs. In
## this example there are no covariate balance requirements. Again, the solver
## used is glpk with the approximate option
## Treatment indicator
#t_ind = treatment
## Matrix of covariates
#X_mat = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
## Distance matrix
#dist_mat = distmat(t_ind, X_mat)
## Subset matching weight
#subset_weight = NULL
## Total pairs to be matched
#total_groups = sum(t_ind)
## Solver options
#t_max = 60*5
#solver = "glpk"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate,
#round_cplex = 0, trace_cplex = 0)
## Match
#out = bmatch(t_ind = t_ind, dist_mat = dist_mat, total_groups = total_groups,
#solver = solver)
## Indices of the treated units and matched controls
#t_id = out$t_id
#c_id = out$c_id
## Total of distances between matched pairs
#out$obj_total
## Assess mean balance
#meantab(X_mat, t_ind, t_id, c_id)
##################################
## Example 3: optimal subset matching
##################################
## Optimal subset matching pursues two competing goals at
## the same time: to minimize the total sum of covariate distances
## while matching as many observations as possible. The tradeoff
## between these two goals is regulated by the parameter subset_weight
## (see Rosenbaum 2012 and Zubizarreta et al. 2013 for a discussion).
## Here the balance requirements are mean balance, nearfine balance
## and nearexact matching for different covariates.
## Again, the solver used is glpk with the approximate option.
## Treatment indicator
#t_ind = treatment
## Matrix of covariates
#X_mat = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
## Distance matrix
#dist_mat = distmat(t_ind, X_mat)
## Subset matching weight
#subset_weight = median(dist_mat)
## Moment balance: constrain differences in means to be at most .05 standard deviations apart
#mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
#mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
#mom = list(covs = mom_covs, tols = mom_tols)
## Nearfine balance
#near_fine_covs = cbind(married, nodegree)
#near_fine_devs = rep(5, 2)
#near_fine = list(covs = near_fine_covs, devs = near_fine_devs)
## Nearexact matching
#near_exact_covs = cbind(black, hispanic)
#near_exact_devs = rep(5, 2)
#near_exact = list(covs = near_exact_covs, devs = near_exact_devs)
## Solver options
#t_max = 60*5
#solver = "glpk"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate,
#round_cplex = 0, trace_cplex = 0)
## Match
#out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight,
#mom = mom, near_fine = near_fine, near_exact = near_exact, solver = solver)
## Indices of the treated units and matched controls
#t_id = out$t_id
#c_id = out$c_id
## Time
#out$time/60
## Matched group identifier (who is matched to whom)
#out$group_id
## Assess mean balance (note here we are getting an approximate solution)
#meantab(X_mat, t_ind, t_id, c_id)
## Assess fine balance
#for (i in 1:ncol(near_fine_covs)) {
# print(finetab(near_fine_covs[, i], t_id, c_id))
#}
## Assess exact matching balance
#for (i in 1:ncol(near_exact_covs)) {
# print(table(near_exact_covs[t_id, i]==near_exact_covs[c_id, i]))
#}

Loading required package: lattice
Loading required package: MASS
Loading required package: slam
Loading required package: Rglpk
Using the GLPK callable library version 4.65
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[482] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[519] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Building the matching problem...
GLPK optimizer is open...
Finding the optimal matches...
Optimal matches found
elapsed
0.05715
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
[109] 109 110 111 112 113 114 115 116 1 2 3 4 5 6 7 8 9 10
[127] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[145] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
[163] 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
[181] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
[199] 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
[217] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
Mis Min Max Mean T Mean C Std Dif Pval
age 0 16 55.00 26.51 26.37 0.02 0.91
education 0 1 18.00 10.41 10.36 0.02 0.90
black 0 0 1.00 0.75 0.75 0.00 1.00
hispanic 0 0 1.00 0.09 0.09 0.00 1.00
married 0 0 1.00 0.28 0.28 0.00 1.00
nodegree 0 0 1.00 0.62 0.62 0.00 1.00
re74 0 0 35040.07 3057.10 3294.02 0.04 0.74
re75 0 0 25142.24 1943.60 2039.24 0.03 0.83
Units
Cat T C
0 29 29
1 87 87
Units
Cat T C
0 105 105
1 11 11
Units
Cat T C
0 84 84
1 32 32
Units
Cat T C
0 44 44
1 72 72
TRUE
116
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.