Description Usage Arguments Value Author(s) References See Also Examples
Function for optimal distance matching in observational studies. distmatch
minimizes the total sum of covariate distances between matches. distmatch
is a wrapper to bmatch
.
1 
t_ind 
treatment indicator: a vector of zeros and ones indicating treatment (1 = treated; 0 = control). 
dist_mat 
distance matrix: a matrix of positive distances between treated units (rows) and controls (columns). If 
solver 
Optimization solver parameters: a list with four objects,

A list containing the optimal solution, with the following objects:
obj_total 
value of the objective function at the optimum; 
obj_dist_mat 
value of the total sum of distances term of the objective function at the optimum; 
t_id 
indexes of the matched controls at the optimum; 
c_id 
indexes of the matched controls at the optimum; 
group_id 
matched pairs or groups at the optimum; 
time 
time elapsed to find the optimal solution. 
Jose R. Zubizarreta <[email protected]>, Cinar Kilcioglu <[email protected]>.
Rosenbaum, P. R. (2010), Design of Observational Studies, Springer.
sensitivitymv, sensitivitymw.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 
# Load and attach data
data(lalonde)
attach(lalonde)
#################################
# Step 1: use cardinality matching to find the largest sample of matched pairs for which
# all the covariates are finely balanced.
#################################
# Discretize covariates
quantiles = function(covar, n_q) {
p_q = seq(0, 1, 1/n_q)
val_q = quantile(covar, probs = p_q, na.rm = TRUE)
covar_out = rep(NA, length(covar))
for (i in 1:n_q) {
if (i==1) {covar_out[covar<val_q[i+1]] = i}
if (i>1 & i<n_q) {covar_out[covar>=val_q[i] & covar<val_q[i+1]] = i}
if (i==n_q) {covar_out[covar>=val_q[i] & covar<=val_q[i+1]] = i}}
covar_out
}
age_5 = quantiles(age, 5)
education_5 = quantiles(education, 5)
re74_5 = quantiles(re74, 5)
re75_5 = quantiles(re75, 5)
# Treatment indicator
t_ind = treatment
# Fine balance
fine_covs = cbind(black, hispanic, married, nodegree, age_5, education_5, re74_5, re75_5)
fine = list(covs = fine_covs)
# Solver options
t_max = 60*5
solver = "glpk"
approximate = 0
solver = list(name = solver, t_max = t_max, approximate = approximate,
round_cplex = 0, trace = 0)
# Match
out_1 = cardmatch(t_ind, fine = fine, solver = solver)
# Indices of the treated units and matched controls
t_id_1 = out_1$t_id
c_id_1 = out_1$c_id
# Mean balance
covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
meantab(covs, t_ind, t_id_1, c_id_1)
# Fine balance (note here we are getting an approximate solution)
for (i in 1:ncol(fine_covs)) {
print(finetab(fine_covs[, i], t_id_1, c_id_1))
}
#################################
# Step 2: use optimal matching (minimum distance matching) to find the (re)pairing of
# treated and control that minimizes the total sum of covariate distances between matched
# pairs. For this, use the function 'distmatch' which is a wrapper for 'bmatch'.
#################################
# New treatment indicator
t_ind_2 = t_ind[c(t_id_1, c_id_1)]
table(t_ind_2)
# To build the distance matrix, the idea is to use strong predictors of the outcome
dist_mat_2 = abs(outer(re74[t_id_1], re74[c_id_1], ""))
dim(dist_mat_2)
# Match
out_2 = distmatch(t_ind_2, dist_mat_2, solver)
# Indices of the treated units and matched controls
t_id_2 = t_id_1[out_2$t_id]
c_id_2 = c_id_1[out_2$c_idlength(out_2$c_id)]
# Covariate balance is preserved...
meantab(covs, t_ind, t_id_2, c_id_2)
for (i in 1:ncol(fine_covs)) {
print(finetab(fine_covs[, i], t_id_2, c_id_2))
}
# ... but covariate distances are reduced
distances_step_1 = sum(diag(dist_mat_2))
distances_step_2 = sum(diag(dist_mat_2[out_2$t_id, out_2$c_idlength(out_2$c_id)]))
distances_step_1
distances_step_2
# The mean difference in outcomes is the same...
mean(re78[t_id_1]re78[c_id_1])
mean(re78[t_id_2]re78[c_id_2])
# ... but their standard deviation is reduced
sd(re78[t_id_1]re78[c_id_1])
sd(re78[t_id_2]re78[c_id_2])

Loading required package: lattice
Loading required package: MASS
Loading required package: slam
Loading required package: Rglpk
Using the GLPK callable library version 4.52
Building the matching problem...
GLPK optimizer is open...
Finding the optimal matches...
Optimal matches found
Mis Min Max Mean T Mean C Std Dif Pval
age 0 16 55.00 25.40 25.75 0.04 0.79
education 0 2 18.00 10.23 10.44 0.08 0.54
black 0 0 1.00 0.72 0.72 0.00 1.00
hispanic 0 0 1.00 0.11 0.11 0.00 1.00
married 0 0 1.00 0.25 0.25 0.00 1.00
nodegree 0 0 1.00 0.67 0.67 0.00 1.00
re74 0 0 35040.07 2970.12 3042.80 0.01 0.92
re75 0 0 25142.24 2180.45 1836.30 0.11 0.49
Units
Cat T C
0 29 29
1 74 74
Units
Cat T C
0 92 92
1 11 11
Units
Cat T C
0 77 77
1 26 26
Units
Cat T C
0 34 34
1 69 69
Units
Cat T C
1 25 25
2 21 21
3 23 23
4 18 18
5 16 16
Units
Cat T C
1 7 7
2 27 27
3 18 18
4 17 17
5 34 34
Units
Cat T C
2 57 57
3 17 17
4 17 17
5 12 12
Units
Cat T C
2 49 49
3 17 17
4 19 19
5 18 18
t_ind_2
0 1
103 103
[1] 103 103
Building the matching problem...
GLPK optimizer is open...
Finding the optimal matches...
Optimal matches found
Mis Min Max Mean T Mean C Std Dif Pval
age 0 16 55.00 25.40 25.75 0.04 0.79
education 0 2 18.00 10.23 10.44 0.08 0.54
black 0 0 1.00 0.72 0.72 0.00 1.00
hispanic 0 0 1.00 0.11 0.11 0.00 1.00
married 0 0 1.00 0.25 0.25 0.00 1.00
nodegree 0 0 1.00 0.67 0.67 0.00 1.00
re74 0 0 35040.07 2970.12 3042.80 0.01 0.92
re75 0 0 25142.24 2180.45 1836.30 0.11 0.49
Units
Cat T C
0 29 29
1 74 74
Units
Cat T C
0 92 92
1 11 11
Units
Cat T C
0 77 77
1 26 26
Units
Cat T C
0 34 34
1 69 69
Units
Cat T C
1 25 25
2 21 21
3 23 23
4 18 18
5 16 16
Units
Cat T C
1 7 7
2 27 27
3 18 18
4 17 17
5 34 34
Units
Cat T C
2 57 57
3 17 17
4 17 17
5 12 12
Units
Cat T C
2 49 49
3 17 17
4 19 19
5 18 18
[1] 592030.4
[1] 46748.62
[1] 570.3972
[1] 570.3972
[1] 8918.981
[1] 8648.961
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.