Description Usage Arguments Details Value References Examples
Implements the synthetic control method for microlevel data as outlined in
Robbins, Saunders, and Kilmer (2017). microsynth
is designed for use
in assessment of the effect of an intervention using longitudinal data.
However, it may also be used to calculate propensity scoretype weights in
crosssectional data. microsynth
is a generalization
of Synth
(see Abadie and Gardeazabal (2003) and Abadie, Diamond,
Hainmueller (2010, 2011, 2014)) that is designed for data at a more granular
level (e.g., microlevel). For more details see the help vignette:
vignette('microsynth', package = 'microsynth')
.
microsynth
develops a synthetic control group by searching for weights
that exactly match a treatment group to a synthetic control group across
a number of variables while also minimizing the discrepancy between the
synthetic control group and the treatment group across a set second set of
variables. microsynth
works in two primary steps: 1) calculation of
weights and 2) calculation of results. Time series plots of treatment
vs. synthetic control for pertinent outcomes may be performed using the
function plot.microsynth()
.
The time range over which data are observed is segmented into pre and
postintervention periods. Treatment is matched to synthetic control
across the preintervention period, and the effect of the intervention
is assessed across the postintervention (or evaluation) period. The input
end.pre
(which gives the last preintervention time period) is used to
delineate between pre and postintervention. Note that if the intervention
is not believed to have an instantaneous effect, end.pre
should indicate
the time of the intervention.
Variables are categorized as outcomes (which are timevariant) and covariates
(which are timeinvariant). Using the respective inputs match.covar
and match.out
, the user specifies across which covariates and outcomes
(and which preintervention time points of the outcomes) treatment is to be
exactly matched to synthetic control. The inputs match.covar.min
and
match.out.min
are similar but instead specify variables across which
treatment is to be matched to synthetic control as closely as possible. If
there are no variables specified in match.covar.min
and
match.out.min
, the function calibrate()
from the survey
package is used to calculate weights. Otherwise, the function
LowRankQP()
from the package of the same name is used. In the event
that the model specified by match.covar
and match.out
is not
feasible (i.e., weights do not exist that exactly match treatment and
synthetic control subject to the given constraints), a less restrictive
backup model is used.
microsynth
has the capability to perform
statistical inference using Taylor series linearization, a jackknife and
permutation methods. Several sets of weights are calculated. A set of main
weights is calculated that is used to determine a point estimate of the
intervention effect. The main weights can also be used to perform inferences
on the point estimator via Taylor series linearization. If a jackknife is to
be used, one set of weights is calculated for each jackknife replication
group, and if permutation methods are to be used, one set of weights is
calculated for each permutation group. If treatment and synthetic control
are not easily matched based upon the model outlined in match.covar
and match.out
(i.e., an exact solution is infeasible or nearly
infeasible), it is recommended that the jackknife not be used for inference.
The software provides the user the option to output overall findings in an Excel
file. For each outcome variable, the results list the estimated treatment
effect, as well as confidence intervals of the effect and pvalues of a
hypothesis test that assesses whether the effect is zero. Such results are
produced as needed for each of the three methods of statistical inference
noted above. microsynth
can also apply an omnibus test that examines
the presence of a treatment effect jointly across several outcomes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34  microsynth(
data,
idvar,
intvar,
timevar = NULL,
start.pre = NULL,
end.pre = NULL,
end.post = NULL,
match.out = TRUE,
match.covar = TRUE,
match.out.min = NULL,
match.covar.min = NULL,
result.var = TRUE,
omnibus.var = result.var,
period = 1,
scale.var = "Intercept",
confidence = 0.9,
test = "twosided",
perm = 0,
jack = 0,
use.survey = TRUE,
cut.mse = Inf,
check.feas = FALSE,
use.backup = FALSE,
w = NULL,
max.mse = 0.01,
maxit = 250,
cal.epsilon = 1e04,
calfun = "linear",
bounds = c(0, Inf),
result.file = NULL,
printFlag = TRUE,
n.cores = TRUE
)

data 
A data frame. If longitudinal, the data must be entered in tall format (e.g., at the case/timelevel with one row for each time period for each case). Missingness is not allowed. All individuals must have nonNA values of all variables at all time points. 
idvar 
A character string that gives the variable in 
intvar 
A character string that gives the variable in 
timevar 
A character string that gives the variable in

start.pre 
An integer indicating the time point that corresponds to the
beginning of the preintervention period used for
matching. When 
end.pre 
An integer that gives the final time point of the
preintervention period. That is, 
end.post 
An integer that gives the maximum postintervention time that
is taken into when compiling results. That is, the treatment and synthetic
control groups are compared across the outcomes listed in 
match.out 
Either A) logical, B) a vector of variable names that
indicates across which timevarying variables treatment is to be exactly matched
to synthetic control preintervention, or C) a
list consisting of variable names and timespans over which variables should
be aggregated before matching. Note that outcome variables and timevarying
covariates should be included in If The following examples show the proper formatting of 
match.covar 
Either a logical or a vector of variable names that
indicates which time invariant covariates
are to be used for weighting. Weights are
calculated so that treatment and synthetic control exactly match across
these variables. If 
match.out.min 
A vector or list of the same format as 
match.covar.min 
A vector of variable names that indicates supplemental time invariant variables that are to be used for weighting, for which exact matches are not required. Weights are calculated so the distance is minimized between treatment and synthetic control across these variables. 
result.var 
A vector of variable names giving the outcome
variables for which results will be reported. Timevarying covariates
should be excluded from 
omnibus.var 
A vector of variable names that indicates the outcome
variables that are to be used within the calculation of the omnibus
statistic. Can also be a logical indicator. When 
period 
An integer that gives the granularity of the data that will be
used for plotting and compiling results. If Note that plotting is performed with

scale.var 
A variable name. When comparing the treatment group to all
cases, the latter is scaled to the size of the former with respect to the
variable indicated by 
confidence 
The level of confidence for confidence intervals. 
test 
The type of hypothesis test (onesided lower, onesided upper, or
twosided) that is used when calculating pvalues. Entries of

perm 
An integer giving the number of permutation groups that are used.
If 
jack 
An integer giving the number of replication groups that are used
for the jackknife. 
use.survey 
If 
cut.mse 
The maximum error (given as meansquared error) permissible for permutation groups. Permutation groups with a larger than permissible error are dropped when calculating results. The meansquared error is only calculated over constraints that are to be exactly satisfied. 
check.feas 
A logical indicator of whether or not the feasibility of
the model specified by 
use.backup 
A logical variable that, when true, indicates whether a
backup model should be used whenever the model specified by

w 
A 
max.mse 
The maximum error (given as meansquared error) permissible
for constraints that are to be exactly satisfied. If 
maxit 
The maximum number of iterations used within the calibration
routine ( 
cal.epsilon 
The tolerance used within the calibration routine
( 
calfun 
The calibration function used within the calibration routine
( 
bounds 
Bounds for calibration weighting (fed into the

result.file 
A character string giving the name of a file that will be
created in the home directory containing results. If 
printFlag 
If TRUE, 
n.cores 
The number of CPU cores to use for parallelization. If

microsynth
requires specification of the following inputs:
data
, idvar
, intvar
. data
is a longitudinal data
frame; idvar
and intvar
are character strings that specific
pertinent columns of data
. In longitudinal data, timevar
should be specified. Furthermore, specification of match.out
and
match.covar
is recommended.
microsynth
can also be used to calculate propensity scoretype weights
in cross sectional data (in which case timevar
does not need to be
specified) as proposed by Hainmueller (2012).
microsynth
calculates weights using
survey::calibrate()
from the survey
package in circumstances
where a feasible solution exists for all constraints, whereas
LowRankQP::LowRankQP()
is used to assess feasibility and to
calculate weights in the event that a feasible solution to all constraints
does not exist. The LowRankQP
routine is memoryintensive and can
run quite slowly in data that have a large number of cases. To prevent
LowRankQP
from being used, set match.out.min = NULL
,
match.covar.min= NULL
, check.feas = FALSE
, and
use.backup = FALSE
.
microsynth
returns a list with up to five elements: a)
w
, b) Results
, c) svyglm.stats
, and
d) Plot.Stats
, and e) info
.
w
is a list with six elements: a) Weights
, b) Intervention
,
c) MSE
, d) Model
, e) Summary
, and f) keep.groups
.
Assume there are
C total sets of weights calculated, where C = 1 + jack + perm
, and
there are N total cases across the treatment and control groups.
w$Weights
is an N x C matrix, where each column provides a set of
weights. w$Intervention
is an N x C matrix made of logical
indicators that indicate whether or not the case in the respective row is
considered treated (at any point in time) for the respective column.
Entries of NA
are to be dropped for the respective jackknife
replication group (NA
s only appear in jackknife weights).
w$MSE
is a 6 x C matrix that give the MSEs for each set of weights.
MSEs are listed for the primary and secondary constraints for the first,
second, and third models. Note that the primary constraints differ for each
model (see Robbins and Davenport, 2021). w$Model
is a lengthC vector that
indicates whether backup models were used in the calculation of each set of
weights. w$keep.groups
is a logical vector indicating which groups
are to be used in analysis (groups that are not used have preintervention
MSE greater than cut.mse
. w$Summary
is a threecolumn matrix
that (for treatment,
synthetic control, and the full dataset), shows aggregate values
of the variables across which treatment and synthetic control are matched.
The summary, which is tabulated only for the primary weights, is also
printed by microsynth
while weights are being calculated.
Further, Results
is a list where each element gives the final
results for each value of end.post
. Each element of Results
is itself a matrix with each row corresponding to an outcome variable (and
a row for the omnibus test, if used) and each column denotes estimates of
the intervention effects and pvalues, upper, and lower bounds of
confidence intervals as found using Taylor series linearization (Linear),
jackknife (jack), and permutation (perm) methods where needed.
In addition, svyglm.stats
is a list where each element is a
matrix that includes the output from the regression models run using the
svyglm()
function to estimate the treatment effect. The list has one
element for each value of end.post
, and the matrices each have
one row per variable in result.var
.
Next, Plot.Stats
contains the data that are displayed in the
plots which may be generated using plot.microsynth()
.
Plot.Stats
is a list with four elements (Treatment, Control,
All, Difference). The first three elements are matrices with one row per
outcome variable and one column per time point. The last element (which
gives the treatment minus control values) is an array that contains data
for each permutation group in addition to the true treatment area.
Specifically, Plot.Stats$Difference[,,1]
contains the time series of
treatment minus control for the true intervention group;
Plot.Stats$Difference[,,i+1]
contains the time series of treatment
minus control for the i^th permutation group.
Lastly, info
documents some input parameters for display by
print()
. A summary of weighted matching variables and of results
can be viewed using summary
Abadie A, Diamond A, Hainmueller J (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program.? Journal of the American Statistical Association, 105(490), 493505.
Abadie A, Diamond A, Hainmueller J (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies.? Journal of Statistical Software, 42(13), 117.
Abadie A, Diamond A, Hainmueller J (2015). Comparative politics and the synthetic control method. American Journal of Political Science, 59(2), 495510.
Abadie A, Gardeazabal J (2003). The economic costs of conflict: A case study of the Basque Country.? American Economic Review, pp. 113132.
Hainmueller, J. (2012), Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies,? Political Analysis, 20, 2546.
Robbins MW, Saunders J, Kilmer B (2017). A framework for synthetic control methods with highdimensional, microlevel data: Evaluating a neighborhood specific crime intervention,? Journal of the American Statistical Association, 112(517), 109126.
Robbins MW, Davenport S (2021). microsynth: Synthetic Control Methods for Disaggregated and MicroLevel Data in R,? Journal of Statistical Software, 97(2), doi:10.18637/jss.v097.i02.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152  # Use seattledmi, blocklevel panel data, to evaluate a crime intervention.
# Declare timevariant (outcome) and timeinvariant variables for matching
cov.var < c('TotalPop', 'BLACK', 'HISPANIC', 'Males_1521',
'HOUSEHOLDS', 'FAMILYHOUS', 'FEMALE_HOU', 'RENTER_HOU', 'VACANT_HOU')
match.out < c('i_felony', 'i_misdemea', 'i_drugs', 'any_crime')
set.seed(99199) # for reproducibility
# Perform matching and estimation, without permutations or jackknife
# runtime: < 1 min
## Not run:
sea1 < microsynth(seattledmi,
idvar='ID', timevar='time', intvar='Intervention',
start.pre=1, end.pre=12, end.post=16,
match.out=match.out, match.covar=cov.var,
result.var=match.out, omnibus.var=match.out,
test='lower',
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea1)
plot_microsynth(sea1)
# Repeat matching and estimation, with permutations and jackknife
# Set permutations and jackknife to very few groups (2) for
# quick demonstration only.
# runtime: ~30 min
sea2 < microsynth(seattledmi,
idvar='ID', timevar='time', intvar='Intervention',
start.pre=1, end.pre=12, end.post=c(14, 16),
match.out=match.out, match.covar=cov.var,
result.var=match.out, omnibus.var=match.out,
test='lower',
perm=250, jack=TRUE,
result.file=file.path(tempdir(), 'ExResults2.xlsx'),
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea2)
plot_microsynth(sea2)
# Specify additional outcome variables for matching, which makes
# matching harder.
match.out < c('i_robbery','i_aggassau','i_burglary','i_larceny',
'i_felony','i_misdemea','i_drugsale','i_drugposs','any_crime')
# Perform matching, setting check.feas = T and use.backup = T
# to ensure model feasibility
# runtime: ~40 minutes
sea3 < microsynth(seattledmi,
idvar='ID', timevar='time', intvar='Intervention',
end.pre=12,
match.out=match.out, match.covar=cov.var,
result.var=match.out, perm=250, jack=0,
test='lower', check.feas=TRUE, use.backup = TRUE,
result.file=file.path(tempdir(), 'ExResults3.xlsx'),
n.cores = min(parallel::detectCores(), 2))
# Aggregate outcome variables before matching, to boost model feasibility
match.out < list( 'i_robbery'=rep(2, 6), 'i_aggassau'=rep(2, 6),
'i_burglary'=rep(1, 12), 'i_larceny'=rep(1, 12),
'i_felony'=rep(2, 6), 'i_misdemea'=rep(2, 6),
'i_drugsale'=rep(4, 3), 'i_drugposs'=rep(4, 3),
'any_crime'=rep(1, 12))
# After aggregation, use.backup and cheack.feas no longer needed
# runtime: ~40 minutes
sea4 < microsynth(seattledmi, idvar='ID', timevar='time',
intvar='Intervention', match.out=match.out, match.covar=cov.var,
start.pre=1, end.pre=12, end.post=16,
result.var=names(match.out), omnibus.var=names(match.out),
perm=250, jack = TRUE, test='lower',
result.file=file.path(tempdir(), 'ExResults4.xlsx'),
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea4)
plot_microsynth(sea4)
# Generate weights only (for four variables)
match.out < c('i_felony', 'i_misdemea', 'i_drugs', 'any_crime')
# runtime: ~ 20 minutes
sea5 < microsynth(seattledmi, idvar='ID', timevar='time',
intvar='Intervention', match.out=match.out, match.covar=cov.var,
start.pre=1, end.pre=12, end.post=16,
result.var=FALSE, perm=250, jack=TRUE,
n.cores = min(parallel::detectCores(), 2))
# View weights
summary(sea5)
# Generate results only
sea6 < microsynth(seattledmi, idvar='ID', timevar='time',
intvar='Intervention',
start.pre=1, end.pre=12, end.post=c(14, 16),
result.var=match.out, test='lower',
w=sea5, result.file=file.path(tempdir(), 'ExResults6.xlsx'),
n.cores = min(parallel::detectCores(), 2))
# View results (including previouslyfound weights)
summary(sea6)
# Generate plots only
plot_microsynth(sea6, plot.var=match.out[1:2])
# Apply microsynth in the traditional setting of Synth
# Create macrolevel (small n) data, with 1 treatment unit
set.seed(86879)
ids.t < names(table(seattledmi$ID[seattledmi$Intervention==1]))
ids.c < setdiff(names(table(seattledmi$ID)), ids.t)
ids.synth < c(base::sample(ids.t, 1), base::sample(ids.c, 100))
seattledmi.one < seattledmi[is.element(seattledmi$ID,
as.numeric(ids.synth)), ]
# Apply microsynth to the new macrolevel data
# runtime: < 5 minutes
sea8 < microsynth(seattledmi.one, idvar='ID', timevar='time',
intvar='Intervention',
start.pre=1, end.pre=12, end.post=16,
match.out=match.out[4],
match.covar=cov.var, result.var=match.out[4],
test='lower', perm=250, jack=FALSE,
check.feas=TRUE, use.backup=TRUE,
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea8)
plot_microsynth(sea8)
# Use microsynth to calculate propensity scoretype weights
# Prepare crosssectional data at time of intervention
seattledmi.cross < seattledmi[seattledmi$time==16, colnames(seattledmi)!='time']
# Apply microsynth to find propensity scoretype weights
# runtime: ~5 minutes
sea9 < microsynth(seattledmi.cross, idvar='ID', intvar='Intervention',
match.out=FALSE, match.covar=cov.var, result.var=match.out,
test='lower', perm=250, jack=TRUE,
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea9)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.