cardmatch | R Documentation |
Function for optimal cardinality matching in observational studies. cardmatch
finds the largest matched sample that is balanced and representative by design. The formulation of cardmatch
has been simplified to handle larger data than bmatch
or nmatch
. Similar to bmatch
or nmatch
, the performance of cardmatch
is greatly enhanced by using the solver
options cplex
or gurobi
.
cardmatch(t_ind, mom = NULL, fine = NULL, solver = NULL)
t_ind |
treatment indicator: a vector of zeros and ones indicating treatment (1 = treated; 0 = control). Please note that the data needs to be sorted in decreasing order according to this treatment indicator. |
mom |
moment balance parameters: a list with three arguments,
|
fine |
Fine balance parameters: a list with one argument,
where |
solver |
Optimization solver parameters: a list with four objects,
|
A list containing the optimal solution, with the following objects:
obj_total |
value of the objective function at the optimum; |
obj_dist_mat |
value of the total sum of distances term of the objective function at the optimum; |
t_id |
indexes of the matched treated units at the optimum; |
c_id |
indexes of the matched controls at the optimum; |
group_id |
matched pairs or groups at the optimum; |
time |
time elapsed to find the optimal solution. |
Jose R. Zubizarreta <zubizarreta@hcp.med.harvard.edu>, Cinar Kilcioglu <ckilcioglu16@gsb.columbia.edu>, Juan P. Vielma <jvielma@mit.edu>.
Bennett, M., Vielma, J. P., and Zubizarreta, J. R. (2017), "Building a Representative Matched Sample with Treatment Doses," working paper.
Visconti, G., and Zubizarreta, J. R. (2017), "Finding the Largest Matched Sample that is Balanced by Design: A Case Study of the Effect of an Earthquake on Electoral Outcomes," working paper.
Zubizarreta, J. R. (2012), "Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery," Journal of the American Statistical Association, 107, 1360-1371.
Zubizarreta, J. R., Paredes, R. D., and Rosenbaum, P. R. (2014), "Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of For-profit and Not-for-profit High Schools in Chile," Annals of Applied Statistics, 8, 204-231.
sensitivitymv, sensitivitymw.
# Load, sort, and attach data
data(lalonde)
lalonde = lalonde[order(lalonde$treatment, decreasing = TRUE), ]
attach(lalonde)
#################################
# Step 1: use cardinality matching to find the largest sample of matched pairs for which
# all the covariates are finely balanced.
#################################
# Discretize covariates
quantiles = function(covar, n_q) {
p_q = seq(0, 1, 1/n_q)
val_q = quantile(covar, probs = p_q, na.rm = TRUE)
covar_out = rep(NA, length(covar))
for (i in 1:n_q) {
if (i==1) {covar_out[covar<val_q[i+1]] = i}
if (i>1 & i<n_q) {covar_out[covar>=val_q[i] & covar<val_q[i+1]] = i}
if (i==n_q) {covar_out[covar>=val_q[i] & covar<=val_q[i+1]] = i}}
covar_out
}
age_5 = quantiles(age, 5)
education_5 = quantiles(education, 5)
re74_5 = quantiles(re74, 5)
re75_5 = quantiles(re75, 5)
# Treatment indicator; note that the data needs to be sorted in decreasing order
# according to this treatment indicator
t_ind = treatment
t_ind
# Fine balance
fine_covs = cbind(black, hispanic, married, nodegree, age_5, education_5, re74_5, re75_5)
fine = list(covs = fine_covs)
# Solver options
t_max = 60*5
solver = "highs"
approximate = 0
solver = list(name = solver, t_max = t_max, approximate = approximate,
round_cplex = 0, trace = 0)
# Match
out_1 = cardmatch(t_ind, fine = fine, solver = solver)
# Indices of the treated units and matched controls
t_id_1 = out_1$t_id
c_id_1 = out_1$c_id
# Mean balance
covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
meantab(covs, t_ind, t_id_1, c_id_1)
# Fine balance (note here we are getting an approximate solution)
for (i in 1:ncol(fine_covs)) {
print(finetab(fine_covs[, i], t_id_1, c_id_1))
}
#################################
# Step 2: use optimal matching (minimum distance matching) to find the (re)pairing of
# treated and control that minimizes the total sum of covariate distances between matched
# pairs. For this, use the function 'distmatch' which is a wrapper for 'bmatch'.
#################################
# New treatment indicator
t_ind_2 = t_ind[c(t_id_1, c_id_1)]
table(t_ind_2)
# To build the distance matrix, the idea is to use strong predictors of the outcome
dist_mat_2 = abs(outer(re74[t_id_1], re74[c_id_1], "-"))
dim(dist_mat_2)
# Match
out_2 = distmatch(t_ind_2, dist_mat_2, solver)
# Indices of the treated units and matched controls
t_id_2 = t_id_1[out_2$t_id]
c_id_2 = c_id_1[out_2$c_id-length(out_2$c_id)]
# Covariate balance is preserved...
meantab(covs, t_ind, t_id_2, c_id_2)
for (i in 1:ncol(fine_covs)) {
print(finetab(fine_covs[, i], t_id_2, c_id_2))
}
# ... but covariate distances are reduced
distances_step_1 = sum(diag(dist_mat_2))
distances_step_2 = sum(diag(dist_mat_2[out_2$t_id, out_2$c_id-length(out_2$c_id)]))
distances_step_1
distances_step_2
# The mean difference in outcomes is the same...
mean(re78[t_id_1]-re78[c_id_1])
mean(re78[t_id_2]-re78[c_id_2])
# ... but their standard deviation is reduced
sd(re78[t_id_1]-re78[c_id_1])
sd(re78[t_id_2]-re78[c_id_2])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.