masc_by_phi: Report cross-validation error and Prediction Error, Moving...

Description Usage Arguments Value References See Also Examples

View source: R/crossvalidation.R

Description

Reports treatment effect and cross-validation errors for estimators of the form of the matching and synthetic control (masc) estimator of Kellogg, Mogstad, Pouliot, and Torgovitsky (2019). For a set of masc-type estimators defined by a synthetic control estimator, a matching estimator (m) and a weight (phi), this function returns output associated with the masc estimator constructed by placing a weight of phi on the matching estimator and (1-phi) on the synthetic control estimator.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
masc_by_phi(
  treated,
  donors,
  treated.covariates = NULL,
  donors.covariates = NULL,
  treatment = NULL,
  sc_est = sc_estimator,
  match_est = NearestNeighbors,
  tune_pars = list(min_preperiods = NULL, set_f = NULL, m = NULL, phis = seq(from = 0,
    to = 1, length.out = 100)),
  cv_pars = list(forecast.minlength = 1, forecast.maxlength = 1),
  treatinterval = NULL,
  ...
)

Arguments

treated

A Tx1 matrix of outcomes for the treated unit.

donors

A TxN matrix of outcome paths for untreated units, each column being a control unit.

treatment

An integer. The period T' in which forecasting begins. If NULL or T'>T, then we assume all data is pre-treatment.

sc_est

A function which constructs weights associated with a synthetic control-type estimator. See sc_estimator for input and output if you'd prefer to substitute your own estimator.

tune_pars

A list containing 3 elements. You must specify the first, and you may specify only one of the last two elements. The last two elements describe the folds we include in the cross-validation procedure. Each fold f is denoted by the last period it uses for estimation. That is, fold f will fit estimators using data from period 1 through period f, and forecast into period f+1.

m:

a vector of integers. Denotes the set of nearest neighbor estimators from which we are allowed to pick. E.g., tune_pars_list$m=c(1,3,5) would allow us to pick from 1-NN, 3-NN, or 5-NN. Alternatively, tune_pars_list$m permits a logical vector. In this case, e.g., tune_pars_list$m=c(FALSE,TRUE,TRUE) would allow us to pick from 2-NN or 3-NN. If NULL, we default to allowing all possible nearest neighbor estimators.

min_preperiods:

an integer. The smallest number of estimation periods allowed in a fold used for cross-validation. We use all folds from fold min_preperiods up to the latest possible fold treatment-2.

set_f:

a list containing a single element, a vector of integers. Identifies the set of folds used for cross-validation. As above, each integer identifies a fold by the last time period it uses in estimation. E.g., set_f=c(7,8,9) would implement cross-validation using fold 7, fold 8, and fold 9.

If neither min_preperiods nor set_f are specified, then we set min_preperiods to ceiling(treatment/2). In other words, we pick the first cross-validation fold so that it is estimated on the first half of the pre-period data.

treatinterval

A vector. Indicates the post-treatment periods used when reporting average treatment effects in the column pred of the output. E.g., treatinterval=1:5 causes the pred column to report the average treatment effect over the first 5 treatment periods. If NULL, we default to averaging over all post-treatment periods.

nogurobi

A logical value. If true, uses LowRankQP to solve the synthetic control estimator, rather than gurobi.

phivals

A vector of real values between 0 and 1. Indexes a weighted average of the synthetic control estimator with a matching estimator, where phival indicates the weight on the matching estimator (1-phival being the weight on synthetic controls).

Value

Returns a data.frame with each row defined by a value of m and phi taken respectively from tune_pars$m and tune_pars$phivals. The columns cv.error and pred return respectively the cross-validation error and a measure of prediction error (AKA treatment effect) associated with the masc estimator defined by m and phi.

References

Kellogg, M., M. Mogstad, G. Pouliot, and A. Torgovitsky. Combining Matching and Synthetic Control to Trade off Biases from Extrapolation and Interpolation. Working Paper, 2019.

See Also

Other masc functions: cv_masc(), masc(), sc_estimator(), solve_masc()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
 ##Example: Terrorism in the Basque Region, from
##Abadie and Gardeazabal (2003).

#First, load the Synth package, which includes the dataset:

if (requireNamespace("Synth",quietly=TRUE) & requireNamespace("data.table",quietly=TRUE)){
library(Synth)
library(data.table)
data(basque)
basque<-as.data.table(basque)
basque <- basque[regionno!=1,]
basque[,regionname:= gsub(" (.*)","",regionname)]
#Grabbing region names:
names<- c(unique(basque[regionno==17,regionname]),unique(basque[regionno!=17,regionname]))
basque <- cbind(basque[regionno==17,gdpcap],
                                            t(reshape(basque[regionno!=17,.(regionno,year,gdpcap)],
                                             idvar='regionno', timevar='year',direction='wide')[,-"regionno",with=FALSE]))


result <- masc(treated=basque[,1], donors=basque[,-1],treatment=16, tune_pars_list=list(m=1:10,
                                             min_preperiods=8))
names(result$weights)<-names[-1]

#weights on control units:
print(round(result$weights,3))

#Treatment effects of terrorism on GDP per capita
#in thousands of 1986 US dollars, over 1970-1975:
#(first 6 years of treatment)
print(result$pred.error[1:6,])

#Selected tuning parameters?
print(paste0("Selected matching estimator: ",result$m_hat))
print(paste0("Selected weight on matching: ",result$phi_hat))

#Now, examine the shape of A) the CV error (mean square prediciton error in pre-period) and
# B) average prediction error (AKA treatment effect) over the first 5 treatment years,
#both over values of phi, fixing the matching estimator (moving from matching to synthetic controls)
phis<-seq(0,1,length.out=100)
phi_table<-masc_by_phi(treated=basque[,1], donors=basque[,-1],treatment=16, tune_pars=list(m=result$m_hat,
                                             min_preperiods=8,phis=phis))
#Printing CV error and prediction error over values of phi. CV error is clearly lowest at intermediary values of phi,
#suggesting an estimator between matching and synthetic controls does best at forecasting. The average medium-run treatment
#effect is monotonically increasing as we move away from synthetic control and toward matching.
print(phi_table)
}

maxkllgg/masc documentation built on Sept. 7, 2021, 8:44 a.m.