generate_weights | R Documentation |
Generates weights from the the aggregate-level predictors to generate the synthetic control. These weights determine which variable and which unit from the donor pool is important in generating the synthetic control.
generate_weights(
data,
optimization_window = NULL,
custom_variable_weights = NULL,
include_fit = FALSE,
optimization_method = c("Nelder-Mead", "BFGS"),
genoud = FALSE,
quadopt = "ipop",
margin_ipop = 5e-04,
sigf_ipop = 5,
bound_ipop = 10,
verbose = FALSE,
...
)
data |
nested data of type |
optimization_window |
the temporal window of the pre-intervention outcome time series to be used in the optimization task. Default behavior uses the entire pre-intervention time period. |
custom_variable_weights |
a vector of provided weights that define a variable's importance in the optimization task. The weights are intended to reflect the users prior regarding the relative significance of each variable. Vector must sum to one. Note that the method is significantly faster when a custom variable weights are provided. Default behavior assumes no wieghts are provided and thus must be learned from the data. |
include_fit |
Boolean flag, if TRUE, then the optimization output is
included in the outputted |
optimization_method |
string vector that specifies the optimization algorithms to be used. Permissable values are all optimization algorithms that are currently implemented in the optimx function (see this function for details). This list currently includes c('Nelder-Mead', 'BFGS', 'CG', 'L-BFGS-B', 'nlm', 'nlminb', 'spg', and 'ucminf"). If multiple algorithms are specified, synth will run the optimization with all chosen algorithms and then return the result for the best performing method. Default is c('Nelder-Mead','BFGS'). As an additional possibility, the user can also specify 'All' which means that synth will run the results over all algorithms in optimx. |
genoud |
Logical flag. If true, synth embarks on a two step optimization. In the first step, genoud, an optimization function that combines evolutionary algorithm methods with a derivative-based (quasi-Newton) method to solve difficult optimization problems, is used to obtain a solution. See genoud for details. In the second step, the genoud results are passed to the optimization algorithm(s) chosen in optimxmethod for a local optimization within the neighborhood of the genoud solution. This two step optimization procedure will require much more computing time, but may yield lower loss in cases where the search space is highly irregular. |
quadopt |
string vector that specifies the routine for quadratic optimization over w weights. possible values are "ipop" and "LowRankQP" (see ipop and LowRankQP for details). default is 'ipop' |
margin_ipop |
setting for ipop optimization routine: how close we get to the constrains (see ipop for details) |
sigf_ipop |
setting for ipop optimization routine: Precision (default: 7 significant figures (see ipop for details) |
bound_ipop |
setting for ipop optimization routine: Clipping bound for the variables (see ipop for details) |
verbose |
Logical flag. If TRUE then intermediate results will be shown. |
... |
Additional arguments to be passed to optimx and or genoud to adjust optimization. |
Optimization
The method completes the following nested minimization task:
W^*(V) = min \sum^M_{m=1} v_m (X_{1m} - \sum^{J+1}_{j=2}w_j X_{jm})^2
Where X_1
and X_0
, which are matrices of aggregate-level
covariates, are generated using the generate_predictor()
function. V
denotes the variable weights with M
reflecting the total number of
predictor variables. Thus, the optimal weights are a function of V
.
The weights themselves are optimized via the following:
\sum^{T_0}_{t=1}(Y_{1t} - \sum^{J=1}_{j=2}w^*_j(V)Y_{jt})^2
where T_0
denotes the pre-intervention period (or a specific
optimization window supplied by the argument time_window
); J
denotes
the number of control units from the donor pool, where j=1
reflects the
treated unit.
Thus, the weights are selected in a manner that produces a synthetic
\hat{Y}
that approximates the observed Y
as closely as possible.
Variable Weights
As proposed in Abadie and Gardeazabal (2003) and Abadie, Diamond, Hainmueller
(2010), the synth function routinely searches for the set of weights that
generate the best fitting convex combination of the control units. In other
words, the predictor weight matrix V (custom_variable_weights
) is chosen
among all positive definite diagonal matrices such that MSPE is minimized for
the pre-intervention period. Instead of using this data-driven procedures to
search for the best fitting synthetic control group, the user may supply
their own weights using the custom_variable_weights
argument. These weights
reflect the user's subjective assessment of the predictive power of the
variables generated by generate_predictor()
.
When generating weights for the placebo cases, the variable weights used for the fit of the treated unit optimization. This ensures comparability between the placebo and treated fits. In addition, it greatly decreases processing time as the variable weights do not be learned for every placebo entry.
tbl_df
with nested fields containing the following:
.id
: unit id for the intervention case (this will differ when a placebo
unit).
.placebo
: indicator field taking on the value of 1 if a unit is a
placebo unit, 0 if it's the specified treated unit.
.type
: type of the nested data construct: treated
or controls
.
Keeps tract of which data construct is located in .outcome
field.
.outcome
: nested data construct containing the outcome variable
configured for the sythnetic control method. Data is configured into a wide
format for the optimization task.
.predictors
: nested data construct containing the covariate matrices
for the treated and control (donor) units. Data is configured into a wide
format for the optimization task.
.unit_weights
: Nested column of unit weights (i.e. how each unit from
the donor pool contributes to the synthetic control). Weights should sum to
.predictor_weights
: Nested column of predictor variable weights (i.e.
the significance of each predictor in optimizing the weights that generate
the synthetic control). Weights should sum to 1. If variable weights are
provided, those variable weights are provided.
.original_data
: original impute data filtered by treated or control
units. This allows for easy processing down stream when generating
predictors.
.meta
: stores information regarding the unit and time index, the
treated unit and time and the name of the outcome variable. Used downstream
in subsequent functions.
.loss
: the RMPE loss for both sets of weights.
# Smoking example data
data(smoking)
smoking_out <-
smoking %>%
# initial the synthetic control object
synthetic_control(outcome = cigsale,
unit = state,
time = year,
i_unit = "California",
i_time = 1988,
generate_placebos= TRUE) %>%
# Generate the aggregate predictors used to generate the weights
generate_predictor(time_window=1980:1988,
lnincome = mean(lnincome, na.rm = TRUE),
retprice = mean(retprice, na.rm = TRUE),
age15to24 = mean(age15to24, na.rm = TRUE)) %>%
generate_predictor(time_window=1984:1988,
beer = mean(beer, na.rm = TRUE)) %>%
generate_predictor(time_window=1975,
cigsale_1975 = cigsale) %>%
generate_predictor(time_window=1980,
cigsale_1980 = cigsale) %>%
generate_predictor(time_window=1988,
cigsale_1988 = cigsale) %>%
# Generate the fitted weights for the synthetic control
generate_weights(optimization_window =1970:1988,
Margin.ipop=.02,Sigf.ipop=7,Bound.ipop=6)
# Retrieve weights
smoking_out %>% grab_predictor_weights()
smoking_out %>% grab_unit_weights()
# Retrieve the placebo weights as well.
smoking_out %>% grab_predictor_weights(placebo= TRUE)
smoking_out %>% grab_unit_weights(placebo= TRUE)
# Plot the unit weights
smoking_out %>% plot_weights()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.