Description Usage Arguments Value Author(s) References See Also Examples
Function for optimal cardinality matching in observational studies. cardmatch
finds the largest matched sample that is balanced and representative by design. The formulation of cardmatch
has been simplified to handle larger data than bmatch
or nmatch
. Similar to bmatch
or nmatch
, the performance of cardmatch
is greatly enhanced by using the solver
options cplex
or gurobi
.
1 
t_ind 
treatment indicator: a vector of zeros and ones indicating treatment (1 = treated; 0 = control). 
mom 
moment balance parameters: a list with three arguments,

fine 
Fine balance parameters: a list with one argument,
where 
solver 
Optimization solver parameters: a list with four objects,

A list containing the optimal solution, with the following objects:
obj_total 
value of the objective function at the optimum; 
obj_dist_mat 
value of the total sum of distances term of the objective function at the optimum; 
t_id 
indexes of the matched controls at the optimum; 
c_id 
indexes of the matched controls at the optimum; 
group_id 
matched pairs or groups at the optimum; 
time 
time elapsed to find the optimal solution. 
Jose R. Zubizarreta <[email protected]>, Cinar Kilcioglu <[email protected]>, Juan P. Vielma <[email protected]>.
Bennett, M., Vielma, J. P., and Zubizarreta, J. R. (2017), "Building a Representative Matched Sample with Treatment Doses," working paper.
Visconti, G., and Zubizarreta, J. R. (2017), "Finding the Largest Matched Sample that is Balanced by Design: A Case Study of the Effect of an Earthquake on Electoral Outcomes," working paper.
Zubizarreta, J. R. (2012), "Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery," Journal of the American Statistical Association, 107, 13601371.
Zubizarreta, J. R., Paredes, R. D., and Rosenbaum, P. R. (2014), "Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of Forprofit and Notforprofit High Schools in Chile," Annals of Applied Statistics, 8, 204231.
sensitivitymv, sensitivitymw.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 
# Load and attach data
data(lalonde)
attach(lalonde)
#################################
# Step 1: use cardinality matching to find the largest sample of matched pairs for which
# all the covariates are finely balanced.
#################################
# Discretize covariates
quantiles = function(covar, n_q) {
p_q = seq(0, 1, 1/n_q)
val_q = quantile(covar, probs = p_q, na.rm = TRUE)
covar_out = rep(NA, length(covar))
for (i in 1:n_q) {
if (i==1) {covar_out[covar<val_q[i+1]] = i}
if (i>1 & i<n_q) {covar_out[covar>=val_q[i] & covar<val_q[i+1]] = i}
if (i==n_q) {covar_out[covar>=val_q[i] & covar<=val_q[i+1]] = i}}
covar_out
}
age_5 = quantiles(age, 5)
education_5 = quantiles(education, 5)
re74_5 = quantiles(re74, 5)
re75_5 = quantiles(re75, 5)
# Treatment indicator
t_ind = treatment
# Fine balance
fine_covs = cbind(black, hispanic, married, nodegree, age_5, education_5, re74_5, re75_5)
fine = list(covs = fine_covs)
# Solver options
t_max = 60*5
solver = "glpk"
approximate = 0
solver = list(name = solver, t_max = t_max, approximate = approximate,
round_cplex = 0, trace = 0)
# Match
out_1 = cardmatch(t_ind, fine = fine, solver = solver)
# Indices of the treated units and matched controls
t_id_1 = out_1$t_id
c_id_1 = out_1$c_id
# Mean balance
covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
meantab(covs, t_ind, t_id_1, c_id_1)
# Fine balance (note here we are getting an approximate solution)
for (i in 1:ncol(fine_covs)) {
print(finetab(fine_covs[, i], t_id_1, c_id_1))
}
#################################
# Step 2: use optimal matching (minimum distance matching) to find the (re)pairing of
# treated and control that minimizes the total sum of covariate distances between matched
# pairs. For this, use the function 'distmatch' which is a wrapper for 'bmatch'.
#################################
# New treatment indicator
t_ind_2 = t_ind[c(t_id_1, c_id_1)]
table(t_ind_2)
# To build the distance matrix, the idea is to use strong predictors of the outcome
dist_mat_2 = abs(outer(re74[t_id_1], re74[c_id_1], ""))
dim(dist_mat_2)
# Match
out_2 = distmatch(t_ind_2, dist_mat_2, solver)
# Indices of the treated units and matched controls
t_id_2 = t_id_1[out_2$t_id]
c_id_2 = c_id_1[out_2$c_idlength(out_2$c_id)]
# Covariate balance is preserved...
meantab(covs, t_ind, t_id_2, c_id_2)
for (i in 1:ncol(fine_covs)) {
print(finetab(fine_covs[, i], t_id_2, c_id_2))
}
# ... but covariate distances are reduced
distances_step_1 = sum(diag(dist_mat_2))
distances_step_2 = sum(diag(dist_mat_2[out_2$t_id, out_2$c_idlength(out_2$c_id)]))
distances_step_1
distances_step_2
# The mean difference in outcomes is the same...
mean(re78[t_id_1]re78[c_id_1])
mean(re78[t_id_2]re78[c_id_2])
# ... but their standard deviation is reduced
sd(re78[t_id_1]re78[c_id_1])
sd(re78[t_id_2]re78[c_id_2])

Loading required package: lattice
Loading required package: MASS
Loading required package: slam
Loading required package: Rglpk
Using the GLPK callable library version 4.52
Building the matching problem...
GLPK optimizer is open...
Finding the optimal matches...
Optimal matches found
Mis Min Max Mean T Mean C Std Dif Pval
age 0 16 55.00 25.40 25.75 0.04 0.79
education 0 2 18.00 10.23 10.44 0.08 0.54
black 0 0 1.00 0.72 0.72 0.00 1.00
hispanic 0 0 1.00 0.11 0.11 0.00 1.00
married 0 0 1.00 0.25 0.25 0.00 1.00
nodegree 0 0 1.00 0.67 0.67 0.00 1.00
re74 0 0 35040.07 2970.12 3042.80 0.01 0.92
re75 0 0 25142.24 2180.45 1836.30 0.11 0.49
Units
Cat T C
0 29 29
1 74 74
Units
Cat T C
0 92 92
1 11 11
Units
Cat T C
0 77 77
1 26 26
Units
Cat T C
0 34 34
1 69 69
Units
Cat T C
1 25 25
2 21 21
3 23 23
4 18 18
5 16 16
Units
Cat T C
1 7 7
2 27 27
3 18 18
4 17 17
5 34 34
Units
Cat T C
2 57 57
3 17 17
4 17 17
5 12 12
Units
Cat T C
2 49 49
3 17 17
4 19 19
5 18 18
t_ind_2
0 1
103 103
[1] 103 103
Building the matching problem...
GLPK optimizer is open...
Finding the optimal matches...
Optimal matches found
Mis Min Max Mean T Mean C Std Dif Pval
age 0 16 55.00 25.40 25.75 0.04 0.79
education 0 2 18.00 10.23 10.44 0.08 0.54
black 0 0 1.00 0.72 0.72 0.00 1.00
hispanic 0 0 1.00 0.11 0.11 0.00 1.00
married 0 0 1.00 0.25 0.25 0.00 1.00
nodegree 0 0 1.00 0.67 0.67 0.00 1.00
re74 0 0 35040.07 2970.12 3042.80 0.01 0.92
re75 0 0 25142.24 2180.45 1836.30 0.11 0.49
Units
Cat T C
0 29 29
1 74 74
Units
Cat T C
0 92 92
1 11 11
Units
Cat T C
0 77 77
1 26 26
Units
Cat T C
0 34 34
1 69 69
Units
Cat T C
1 25 25
2 21 21
3 23 23
4 18 18
5 16 16
Units
Cat T C
1 7 7
2 27 27
3 18 18
4 17 17
5 34 34
Units
Cat T C
2 57 57
3 17 17
4 17 17
5 12 12
Units
Cat T C
2 49 49
3 17 17
4 19 19
5 18 18
[1] 592030.4
[1] 46748.62
[1] 570.3972
[1] 570.3972
[1] 8918.981
[1] 8648.961
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.