masc_by_phi: Report cross-validation error and Prediction Error, Moving...
In maxkllgg/masc: Matching and Synthetic Controls Estimator

Description Usage Arguments Value References See Also Examples

View source: R/crossvalidation.R

Reports treatment effect and cross-validation errors for estimators of the form of the matching and synthetic control (masc) estimator of Kellogg, Mogstad, Pouliot, and Torgovitsky (2019). For a set of masc-type estimators defined by a synthetic control estimator, a matching estimator (m) and a weight (phi), this function returns output associated with the masc estimator constructed by placing a weight of phi on the matching estimator and (1-phi) on the synthetic control estimator.

masc_by_phi(
  treated,
  donors,
  treated.covariates = NULL,
  donors.covariates = NULL,
  treatment = NULL,
  sc_est = sc_estimator,
  match_est = NearestNeighbors,
  tune_pars = list(min_preperiods = NULL, set_f = NULL, m = NULL, phis = seq(from = 0,
    to = 1, length.out = 100)),
  cv_pars = list(forecast.minlength = 1, forecast.maxlength = 1),
  treatinterval = NULL,
  ...
)

`treated`	A Tx1 matrix of outcomes for the treated unit.
`donors`	A TxN matrix of outcome paths for untreated units, each column being a control unit.
`treatment`	An integer. The period T' in which forecasting begins. If `NULL` or T'>T, then we assume all data is pre-treatment.
`sc_est`	A `function` which constructs weights associated with a synthetic control-type estimator. See sc_estimator for input and output if you'd prefer to substitute your own estimator.
`tune_pars`	A `list` containing 3 elements. You must specify the first, and you may specify only one of the last two elements. The last two elements describe the folds we include in the cross-validation procedure. Each fold `f` is denoted by the last period it uses for estimation. That is, fold `f` will fit estimators using data from period 1 through period `f`, and forecast into period `f+1`. m: a vector of integers. Denotes the set of nearest neighbor estimators from which we are allowed to pick. E.g., `tune_pars_list$m=c(1,3,5)` would allow us to pick from 1-NN, 3-NN, or 5-NN. Alternatively, `tune_pars_list$m` permits a logical vector. In this case, e.g., `tune_pars_list$m=c(FALSE,TRUE,TRUE)` would allow us to pick from 2-NN or 3-NN. If `NULL`, we default to allowing all possible nearest neighbor estimators. min_preperiods: an integer. The smallest number of estimation periods allowed in a fold used for cross-validation. We use all folds from fold `min_preperiods` up to the latest possible fold `treatment-2`. set_f: a `list` containing a single element, a vector of integers. Identifies the set of folds used for cross-validation. As above, each integer identifies a fold by the last time period it uses in estimation. E.g., set_f=c(7,8,9) would implement cross-validation using fold 7, fold 8, and fold 9. If neither `min_preperiods` nor `set_f` are specified, then we set `min_preperiods` to `ceiling(treatment/2)`. In other words, we pick the first cross-validation fold so that it is estimated on the first half of the pre-period data.
`treatinterval`	A vector. Indicates the post-treatment periods used when reporting average treatment effects in the column `pred` of the output. E.g., `treatinterval=1:5` causes the `pred` column to report the average treatment effect over the first 5 treatment periods. If `NULL`, we default to averaging over all post-treatment periods.
`nogurobi`	A logical value. If true, uses LowRankQP to solve the synthetic control estimator, rather than `gurobi`.
`phivals`	A vector of real values between 0 and 1. Indexes a weighted average of the synthetic control estimator with a matching estimator, where `phival` indicates the weight on the matching estimator (`1-phival` being the weight on synthetic controls).

Returns a data.frame with each row defined by a value of m and phi taken respectively from tune_pars$m and tune_pars$phivals. The columns cv.error and pred return respectively the cross-validation error and a measure of prediction error (AKA treatment effect) associated with the masc estimator defined by m and phi.

Kellogg, M., M. Mogstad, G. Pouliot, and A. Torgovitsky. Combining Matching and Synthetic Control to Trade off Biases from Extrapolation and Interpolation. Working Paper, 2019.

Other masc functions: cv_masc(), masc(), sc_estimator(), solve_masc()

 ##Example: Terrorism in the Basque Region, from
##Abadie and Gardeazabal (2003).

#First, load the Synth package, which includes the dataset:

if (requireNamespace("Synth",quietly=TRUE) & requireNamespace("data.table",quietly=TRUE)){
library(Synth)
library(data.table)
data(basque)
basque<-as.data.table(basque)
basque <- basque[regionno!=1,]
basque[,regionname:= gsub(" (.*)","",regionname)]
#Grabbing region names:
names<- c(unique(basque[regionno==17,regionname]),unique(basque[regionno!=17,regionname]))
basque <- cbind(basque[regionno==17,gdpcap],
                                            t(reshape(basque[regionno!=17,.(regionno,year,gdpcap)],
                                             idvar='regionno', timevar='year',direction='wide')[,-"regionno",with=FALSE]))


result <- masc(treated=basque[,1], donors=basque[,-1],treatment=16, tune_pars_list=list(m=1:10,
                                             min_preperiods=8))
names(result$weights)<-names[-1]

#weights on control units:
print(round(result$weights,3))

#Treatment effects of terrorism on GDP per capita
#in thousands of 1986 US dollars, over 1970-1975:
#(first 6 years of treatment)
print(result$pred.error[1:6,])

#Selected tuning parameters?
print(paste0("Selected matching estimator: ",result$m_hat))
print(paste0("Selected weight on matching: ",result$phi_hat))

#Now, examine the shape of A) the CV error (mean square prediciton error in pre-period) and
# B) average prediction error (AKA treatment effect) over the first 5 treatment years,
#both over values of phi, fixing the matching estimator (moving from matching to synthetic controls)
phis<-seq(0,1,length.out=100)
phi_table<-masc_by_phi(treated=basque[,1], donors=basque[,-1],treatment=16, tune_pars=list(m=result$m_hat,
                                             min_preperiods=8,phis=phis))
#Printing CV error and prediction error over values of phi. CV error is clearly lowest at intermediary values of phi,
#suggesting an estimator between matching and synthetic controls does best at forecasting. The average medium-run treatment
#effect is monotonically increasing as we move away from synthetic control and toward matching.
print(phi_table)
}