Implements the synthetic control method for micro-level data as outlined in
Robbins, Saunders, and Kilmer (2017).
microsynth is designed for use
in assessment of the effect of an intervention using longitudinal data.
However, it may also be used to calculate propensity score-type weights in
microsynth is a generalization
Synth (see Abadie and Gardeazabal (2003) and Abadie, Diamond,
Hainmueller (2010, 2011, 2014)) that is designed for data at a more granular
level (e.g., micro-level). For more details see the help vignette:
vignette('microsynth', package = 'microsynth').
microsynth develops a synthetic control group by searching for weights
that exactly match a treatment group to a synthetic control group across
a number of variables while also minimizing the discrepancy between the
synthetic control group and the treatment group across a set second set of
microsynth works in two primary steps: 1) calculation of
weights and 2) calculation of results. Time series plots of treatment
vs. synthetic control for pertinent outcomes may be performed using the
The time range over which data are observed is segmented into pre- and
post-intervention periods. Treatment is matched to synthetic control
across the pre-intervention period, and the effect of the intervention
is assessed across the post-intervention (or evaluation) period. The input
end.pre (which gives the last pre-intervention time period) is used to
delineate between pre- and post-intervention. Note that if the intervention
is not believed to have an instantaneous effect,
end.pre should indicate
the time of the intervention.
Variables are categorized as outcomes (which are time-variant) and covariates
(which are time-invariant). Using the respective inputs
match.out, the user specifies across which covariates and outcomes
(and which pre-intervention time points of the outcomes) treatment is to be
exactly matched to synthetic control. The inputs
match.out.min are similar but instead specify variables across which
treatment is to be matched to synthetic control as closely as possible. If
there are no variables specified in
match.out.min, the function
calibrate() from the
package is used to calculate weights. Otherwise, the function
LowRankQP() from the package of the same name is used. In the event
that the model specified by
match.out is not
feasible (i.e., weights do not exist that exactly match treatment and
synthetic control subject to the given constraints), a less restrictive
backup model is used.
microsynth has the capability to perform
statistical inference using Taylor series linearization, a jackknife and
permutation methods. Several sets of weights are calculated. A set of main
weights is calculated that is used to determine a point estimate of the
intervention effect. The main weights can also be used to perform inferences
on the point estimator via Taylor series linearization. If a jackknife is to
be used, one set of weights is calculated for each jackknife replication
group, and if permutation methods are to be used, one set of weights is
calculated for each permutation group. If treatment and synthetic control
are not easily matched based upon the model outlined in
match.out (i.e., an exact solution is infeasible or nearly
infeasible), it is recommended that the jackknife not be used for inference.
The software provides the user the option to output overall findings in an Excel
file. For each outcome variable, the results list the estimated treatment
effect, as well as confidence intervals of the effect and p-values of a
hypothesis test that assesses whether the effect is zero. Such results are
produced as needed for each of the three methods of statistical inference
microsynth can also apply an omnibus test that examines
the presence of a treatment effect jointly across several outcomes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
microsynth( data, idvar, intvar, timevar = NULL, start.pre = NULL, end.pre = NULL, end.post = NULL, match.out = TRUE, match.covar = TRUE, match.out.min = NULL, match.covar.min = NULL, result.var = TRUE, omnibus.var = result.var, period = 1, scale.var = "Intercept", confidence = 0.9, test = "twosided", perm = 0, jack = 0, use.survey = TRUE, cut.mse = Inf, check.feas = FALSE, use.backup = FALSE, w = NULL, max.mse = 0.01, maxit = 250, cal.epsilon = 1e-04, calfun = "linear", bounds = c(0, Inf), result.file = NULL, printFlag = TRUE, n.cores = TRUE )
A data frame. If longitudinal, the data must be entered in tall format (e.g., at the case/time-level with one row for each time period for each case). Missingness is not allowed. All individuals must have non-NA values of all variables at all time points.
A character string that gives the variable in
A character string that gives the variable in
A character string that gives the variable in
An integer indicating the time point that corresponds to the
beginning of the pre-intervention period used for
An integer that gives the final time point of the
pre-intervention period. That is,
An integer that gives the maximum post-intervention time that
is taken into when compiling results. That is, the treatment and synthetic
control groups are compared across the outcomes listed in
Either A) logical, B) a vector of variable names that
indicates across which time-varying variables treatment is to be exactly matched
to synthetic control pre-intervention, or C) a
list consisting of variable names and timespans over which variables should
be aggregated before matching. Note that outcome variables and time-varying
covariates should be included in
The following examples show the proper formatting of
Either a logical or a vector of variable names that
indicates which time invariant covariates
are to be used for weighting. Weights are
calculated so that treatment and synthetic control exactly match across
these variables. If
A vector or list of the same format as
A vector of variable names that indicates supplemental time invariant variables that are to be used for weighting, for which exact matches are not required. Weights are calculated so the distance is minimized between treatment and synthetic control across these variables.
A vector of variable names giving the outcome
variables for which results will be reported. Time-varying covariates
should be excluded from
A vector of variable names that indicates the outcome
variables that are to be used within the calculation of the omnibus
statistic. Can also be a logical indicator. When
An integer that gives the granularity of the data that will be
used for plotting and compiling results. If
Note that plotting is performed with
A variable name. When comparing the treatment group to all
cases, the latter is scaled to the size of the former with respect to the
variable indicated by
The level of confidence for confidence intervals.
The type of hypothesis test (one-sided lower, one-sided upper, or
two-sided) that is used when calculating p-values. Entries of
An integer giving the number of permutation groups that are used.
An integer giving the number of replication groups that are used
for the jackknife.
The maximum error (given as mean-squared error) permissible for permutation groups. Permutation groups with a larger than permissible error are dropped when calculating results. The mean-squared error is only calculated over constraints that are to be exactly satisfied.
A logical indicator of whether or not the feasibility of
the model specified by
A logical variable that, when true, indicates whether a
backup model should be used whenever the model specified by
The maximum error (given as mean-squared error) permissible
for constraints that are to be exactly satisfied. If
The maximum number of iterations used within the calibration
The tolerance used within the calibration routine
The calibration function used within the calibration routine
Bounds for calibration weighting (fed into the
A character string giving the name of a file that will be
created in the home directory containing results. If
The number of CPU cores to use for parallelization. If
microsynth requires specification of the following inputs:
data is a longitudinal data
intvar are character strings that specific
pertinent columns of
data. In longitudinal data,
should be specified. Furthermore, specification of
match.covar is recommended.
microsynth can also be used to calculate propensity score-type weights
in cross sectional data (in which case
timevar does not need to be
specified) as proposed by Hainmueller (2012).
microsynth calculates weights using
survey::calibrate() from the
survey package in circumstances
where a feasible solution exists for all constraints, whereas
LowRankQP::LowRankQP() is used to assess feasibility and to
calculate weights in the event that a feasible solution to all constraints
does not exist. The
LowRankQP routine is memory-intensive and can
run quite slowly in data that have a large number of cases. To prevent
LowRankQP from being used, set
match.out.min = NULL,
check.feas = FALSE, and
use.backup = FALSE.
microsynth returns a list with up to five elements: a)
Plot.Stats, and e)
w is a list with six elements: a)
Summary, and f)
Assume there are
C total sets of weights calculated, where C =
1 + jack + perm, and
there are N total cases across the treatment and control groups.
w$Weights is an N x C matrix, where each column provides a set of
w$Intervention is an N x C matrix made of logical
indicators that indicate whether or not the case in the respective row is
considered treated (at any point in time) for the respective column.
NA are to be dropped for the respective jackknife
replication group (
NAs only appear in jackknife weights).
w$MSE is a 6 x C matrix that give the MSEs for each set of weights.
MSEs are listed for the primary and secondary constraints for the first,
second, and third models. Note that the primary constraints differ for each
model (see Robbins and Davenport, 2021).
w$Model is a length-C vector that
indicates whether backup models were used in the calculation of each set of
w$keep.groups is a logical vector indicating which groups
are to be used in analysis (groups that are not used have pre-intervention
MSE greater than
w$Summary is a three-column matrix
that (for treatment,
synthetic control, and the full dataset), shows aggregate values
of the variables across which treatment and synthetic control are matched.
The summary, which is tabulated only for the primary weights, is also
microsynth while weights are being calculated.
Results is a list where each element gives the final
results for each value of
end.post. Each element of
is itself a matrix with each row corresponding to an outcome variable (and
a row for the omnibus test, if used) and each column denotes estimates of
the intervention effects and p-values, upper, and lower bounds of
confidence intervals as found using Taylor series linearization (Linear),
jackknife (jack), and permutation (perm) methods where needed.
svyglm.stats is a list where each element is a
matrix that includes the output from the regression models run using the
svyglm() function to estimate the treatment effect. The list has one
element for each value of
end.post, and the matrices each have
one row per variable in
Plot.Stats contains the data that are displayed in the
plots which may be generated using
Plot.Stats is a list with four elements (Treatment, Control,
All, Difference). The first three elements are matrices with one row per
outcome variable and one column per time point. The last element (which
gives the treatment minus control values) is an array that contains data
for each permutation group in addition to the true treatment area.
Plot.Stats$Difference[,,1] contains the time series of
treatment minus control for the true intervention group;
Plot.Stats$Difference[,,i+1] contains the time series of treatment
minus control for the i^th permutation group.
info documents some input parameters for display by
print(). A summary of weighted matching variables and of results
can be viewed using
Abadie A, Diamond A, Hainmueller J (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program.? Journal of the American Statistical Association, 105(490), 493-505.
Abadie A, Diamond A, Hainmueller J (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies.? Journal of Statistical Software, 42(13), 1-17.
Abadie A, Diamond A, Hainmueller J (2015). Comparative politics and the synthetic control method. American Journal of Political Science, 59(2), 495-510.
Abadie A, Gardeazabal J (2003). The economic costs of conflict: A case study of the Basque Country.? American Economic Review, pp. 113-132.
Hainmueller, J. (2012), Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies,? Political Analysis, 20, 25-46.
Robbins MW, Saunders J, Kilmer B (2017). A framework for synthetic control methods with high-dimensional, micro-level data: Evaluating a neighborhood- specific crime intervention,? Journal of the American Statistical Association, 112(517), 109-126.
Robbins MW, Davenport S (2021). microsynth: Synthetic Control Methods for Disaggregated and Micro-Level Data in R,? Journal of Statistical Software, 97(2), doi:10.18637/jss.v097.i02.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
# Use seattledmi, block-level panel data, to evaluate a crime intervention. # Declare time-variant (outcome) and time-invariant variables for matching cov.var <- c('TotalPop', 'BLACK', 'HISPANIC', 'Males_1521', 'HOUSEHOLDS', 'FAMILYHOUS', 'FEMALE_HOU', 'RENTER_HOU', 'VACANT_HOU') match.out <- c('i_felony', 'i_misdemea', 'i_drugs', 'any_crime') set.seed(99199) # for reproducibility # Perform matching and estimation, without permutations or jackknife # runtime: < 1 min ## Not run: sea1 <- microsynth(seattledmi, idvar='ID', timevar='time', intvar='Intervention', start.pre=1, end.pre=12, end.post=16, match.out=match.out, match.covar=cov.var, result.var=match.out, omnibus.var=match.out, test='lower', n.cores = min(parallel::detectCores(), 2)) # View results summary(sea1) plot_microsynth(sea1) # Repeat matching and estimation, with permutations and jackknife # Set permutations and jack-knife to very few groups (2) for # quick demonstration only. # runtime: ~30 min sea2 <- microsynth(seattledmi, idvar='ID', timevar='time', intvar='Intervention', start.pre=1, end.pre=12, end.post=c(14, 16), match.out=match.out, match.covar=cov.var, result.var=match.out, omnibus.var=match.out, test='lower', perm=250, jack=TRUE, result.file=file.path(tempdir(), 'ExResults2.xlsx'), n.cores = min(parallel::detectCores(), 2)) # View results summary(sea2) plot_microsynth(sea2) # Specify additional outcome variables for matching, which makes # matching harder. match.out <- c('i_robbery','i_aggassau','i_burglary','i_larceny', 'i_felony','i_misdemea','i_drugsale','i_drugposs','any_crime') # Perform matching, setting check.feas = T and use.backup = T # to ensure model feasibility # runtime: ~40 minutes sea3 <- microsynth(seattledmi, idvar='ID', timevar='time', intvar='Intervention', end.pre=12, match.out=match.out, match.covar=cov.var, result.var=match.out, perm=250, jack=0, test='lower', check.feas=TRUE, use.backup = TRUE, result.file=file.path(tempdir(), 'ExResults3.xlsx'), n.cores = min(parallel::detectCores(), 2)) # Aggregate outcome variables before matching, to boost model feasibility match.out <- list( 'i_robbery'=rep(2, 6), 'i_aggassau'=rep(2, 6), 'i_burglary'=rep(1, 12), 'i_larceny'=rep(1, 12), 'i_felony'=rep(2, 6), 'i_misdemea'=rep(2, 6), 'i_drugsale'=rep(4, 3), 'i_drugposs'=rep(4, 3), 'any_crime'=rep(1, 12)) # After aggregation, use.backup and cheack.feas no longer needed # runtime: ~40 minutes sea4 <- microsynth(seattledmi, idvar='ID', timevar='time', intvar='Intervention', match.out=match.out, match.covar=cov.var, start.pre=1, end.pre=12, end.post=16, result.var=names(match.out), omnibus.var=names(match.out), perm=250, jack = TRUE, test='lower', result.file=file.path(tempdir(), 'ExResults4.xlsx'), n.cores = min(parallel::detectCores(), 2)) # View results summary(sea4) plot_microsynth(sea4) # Generate weights only (for four variables) match.out <- c('i_felony', 'i_misdemea', 'i_drugs', 'any_crime') # runtime: ~ 20 minutes sea5 <- microsynth(seattledmi, idvar='ID', timevar='time', intvar='Intervention', match.out=match.out, match.covar=cov.var, start.pre=1, end.pre=12, end.post=16, result.var=FALSE, perm=250, jack=TRUE, n.cores = min(parallel::detectCores(), 2)) # View weights summary(sea5) # Generate results only sea6 <- microsynth(seattledmi, idvar='ID', timevar='time', intvar='Intervention', start.pre=1, end.pre=12, end.post=c(14, 16), result.var=match.out, test='lower', w=sea5, result.file=file.path(tempdir(), 'ExResults6.xlsx'), n.cores = min(parallel::detectCores(), 2)) # View results (including previously-found weights) summary(sea6) # Generate plots only plot_microsynth(sea6, plot.var=match.out[1:2]) # Apply microsynth in the traditional setting of Synth # Create macro-level (small n) data, with 1 treatment unit set.seed(86879) ids.t <- names(table(seattledmi$ID[seattledmi$Intervention==1])) ids.c <- setdiff(names(table(seattledmi$ID)), ids.t) ids.synth <- c(base::sample(ids.t, 1), base::sample(ids.c, 100)) seattledmi.one <- seattledmi[is.element(seattledmi$ID, as.numeric(ids.synth)), ] # Apply microsynth to the new macro-level data # runtime: < 5 minutes sea8 <- microsynth(seattledmi.one, idvar='ID', timevar='time', intvar='Intervention', start.pre=1, end.pre=12, end.post=16, match.out=match.out, match.covar=cov.var, result.var=match.out, test='lower', perm=250, jack=FALSE, check.feas=TRUE, use.backup=TRUE, n.cores = min(parallel::detectCores(), 2)) # View results summary(sea8) plot_microsynth(sea8) # Use microsynth to calculate propensity score-type weights # Prepare cross-sectional data at time of intervention seattledmi.cross <- seattledmi[seattledmi$time==16, colnames(seattledmi)!='time'] # Apply microsynth to find propensity score-type weights # runtime: ~5 minutes sea9 <- microsynth(seattledmi.cross, idvar='ID', intvar='Intervention', match.out=FALSE, match.covar=cov.var, result.var=match.out, test='lower', perm=250, jack=TRUE, n.cores = min(parallel::detectCores(), 2)) # View results summary(sea9) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.