View source: R/grab-methods.R View source: R/grab-methods.R
grab_significance | R Documentation |
Generate inferential statistics comparing the rarety of the unit that actually received the intervention to the placebo units in the donor pool.
grab_significance(data, time_window = NULL)
data |
nested data of type |
time_window |
time window that the significance values should be computed. |
Inferential statitics are generated by comparing the observed difference between the actual treated unit and its synthetic control to each placebo unit and its synthetic control. The rarity of the actual to the placebo is used to infer the likelihood of observing the effect.
Inference in this framework leverages the mean squared predictive error (MSPE) of the fit in the pre-period to the fit in the post-period as a ratio.
\frac{RMSE_{Post}}{RMSE_{Pre}}
The ratio captures the differences between the pre-intervention fit and the post-intervention divergence of the trend (i.e. the causal quantity). A good fit in the pre-period denotes that the observed and synthetic case tracked well together. Divergence in the post-period captures the difference brought about by the intervention in the two trends. Thus, when the ratio is high, we observe more of a difference between the two trends. If, however, the pre-period fit is poor, or there is not substantial divergence in the post-period, then this ratio amount will be smaller.
The Fisher's Exact P-Value is generated by ranking the ratios for the treated and placebo units. The P-Value is then calculated by dividing the rank of the case over the total (rank/total). The case with the highest RMSE ratio is rare given the distribution of cases as generated by the placebo. A more detailed outline of inference within the synthetic control framework can be found in Adabie et al. 2010.
Note that conventional significance levels are not achievable if there is an insufficient number of control cases. One needs at least 20 control case to use the conventional .05 level. With fewer cases, significance levels need to be adjusted to accommodate the low total rank. This is a bug of rank based significance metrics.
In addition to the Fisher's Precise P-Value, a Z-score is also included, which is just the standardized RMSE ratios for all the cases. The Z-Score captures the degree to which a particular case's RMSE ratio deviates from the distribution of the placebo cases.
tibble data frame containing the following fields:
unit_name
: name of the unit
type
: treated or donor unit (placebo)
pre_mspe
: pre-intervention period means squared predictive error
post_mspe
: post-intervention period means squared predictive error
mspe_ratio
: post_mspe/pre_mspe; captures the difference in fit in the
pre and post period. A good fit in the pre-period and a poor fit in the
post-period reflects a meaningful effect when comparing the difference
between the observed outcome and the synthetic control.
rank
: rank order of the mspe_ratio.
fishers_exact_pvalue
: rank/total to generate a p-value. Conventional
levels aren't achievable if there isn't a sufficient number of controls to
generate a large enough ranking. Need at least 20 control units to use the
conventional .05 level.
z_score
: (mspe_ratio-mean(mspe_ratio))/sd(mspe_ratio); captures the
degree to which the mspe_ratio of the treated unit deviates from the mean
of the placebo units. Provinding an alternative significance determination.
# Smoking example data
data(smoking)
smoking_out <-
smoking %>%
# initial the synthetic control object
synthetic_control(outcome = cigsale,
unit = state,
time = year,
i_unit = "California",
i_time = 1988,
generate_placebos=FALSE) %>%
# Generate the aggregate predictors used to generate the weights
generate_predictor(time_window=1980:1988,
lnincome = mean(lnincome, na.rm = TRUE),
retprice = mean(retprice, na.rm = TRUE),
age15to24 = mean(age15to24, na.rm = TRUE)) %>%
generate_predictor(time_window=1984:1988,
beer = mean(beer, na.rm = TRUE)) %>%
generate_predictor(time_window=1975,
cigsale_1975 = cigsale) %>%
generate_predictor(time_window=1980,
cigsale_1980 = cigsale) %>%
generate_predictor(time_window=1988,
cigsale_1988 = cigsale) %>%
# Generate the fitted weights for the synthetic control
generate_weights(optimization_window =1970:1988,
Margin.ipop=.02,Sigf.ipop=7,Bound.ipop=6) %>%
# Generate the synthetic control
generate_control()
# Plot the observed and synthetic trend
smoking_out %>% grab_significance(time_window = 1970:2000)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.