Description Usage Arguments Details Value References See Also Examples

View source: R/crossvalidation.R

Implements the matching and synthetic control (masc) estimator of Kellogg, Mogstad, Pouliot, and Torgovitsky (2021).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |

`treated` |
A |

`donors` |
A #' Note that |

`treated.covariates` |
An optional argument; A matrix consisting of time series data on covariates, for the treated unit. |

`donors.covariates` |
An optional argument; A matrix consisting of time series data on covariates, for the control units. If specified, both - unit:
can be of any type, and may be omitted from `treated.covariates` .- time:
a numeric column, indicating when the covariates were realized relative to the rows of `treated` and`donors` . If`time = 2` , then the covariates in that row were realized at the same time as outcomes of row 2 in`treated` and`donors` (or were realized after row 1 but sometime before row 2). For time-invariant pre-determined characteristics, you can set`time = 0` .
The covariates are averaged over the pre-period, and the resulting averages are used to construct the estimators (during cross-validation, they will average over averages will be taken within each fold). |

`treatment` |
An integer. The period T' in which forecasting begins. If |

`sc_est` |
A |

`match_est` |
A |

`tune_pars_list` |
A - m:
a vector of integers. Denotes the set of nearest neighbor estimators from which we are allowed to pick. E.g., `tune_pars_list$m=c(1,3,5)` would allow us to pick from 1-NN, 3-NN, or 5-NN. Alternatively,`tune_pars_list$m` permits a logical vector. In this case, e.g.,`tune_pars_list$m=c(FALSE,TRUE,TRUE)` would allow us to pick from 2-NN or 3-NN. If`NULL` , we default to allowing all possible nearest neighbor estimators.- min_preperiods:
an integer. The smallest number of estimation periods allowed in a fold used for cross-validation. We use all folds from fold `min_preperiods` up to the latest possible fold`treatment-2` .- set_f:
a vector of integers. Identifies the set of folds used for cross-validation. As above, each integer identifies a fold by the last time period it uses in estimation. E.g., set_f=c(7,8,9) would implement cross-validation using fold 7, fold 8, and fold 9. - weights_f:
a `vector` of length`length(set_f)` or`length(min_preperiods:(treatment-2))` , containing weakly positive relative weight values for each of the cross-validation folds. The elements of`weights_f` are normalized to sum to 1- matchVfun:
a `function` that governs how unit characteristics are weighted together for matching. If the estimator is purely outcome-based, then the default behavior is raw matching on the outcome paths. If the estimator uses covariates, then the default behavior is to weigh outcomes by their standard deviations across units.
If neither |

`cv_pars` |
A |

`alloutput` |
A logical value. If true, output includes a list |

`phival` |
If specified, a numeric value describing how the matching and SC estimator should be combined (without cross-validation). A value of 1 represents using only the matching estimator, a value of 0 represents using only the SC estimator. |

`...` |
Other arguments to pass to |

The `masc`

estimator takes a convex combination of a
nearest neighbor estimator and a synthetic control estimator. That combination
is parametrized by a model averaging parameter which takes a value of 1 when
the estimator is equivalent to a pure matching estimator, and 0 when it is equivalent
to a pure synthetic control estimator.
This function selects the nearest neighbor estimator and model
averaging parameter by a rolling-origin cross-validation procedure. Computationally,
we minimize the cross-validation criterion in two steps. First, for each candidate
nearest neighbor estimator, we solve for the model averaging parameters using an analytic
expression (see Equation 15 of the working paper). Then, we select the candidate nearest
neighbor estimator which produces the smallest cross-validation criterion value.

This implementation by default may use the `gurobi`

interfaced with R to solve for the synthetic control estimator.
Gurobi and its associated R package are available on the gurobi website:
https://cran.r-project.org/web/packages/prioritizr/vignettes/gurobi_installation.html

It is recommended to use this implementation of the `masc`

estimator. However, the function may
alternatively use an implementation based on LowRankQP for convenience.

By default, returns a list containing five objects:

- phi_hat:
selected value for the model averaging parameter (1 is pure matching, 0 pure synthetic control).

- m_hat:
selected matching (nearest neighbor) estimator.

- weights:
The vector length N containing weights placed on each control unit.

- pred.error:
The vector of treatment effects implied by the masc counterfactual, for periods T' to T.

- cv.error:
The average (weighted by

`weights_f`

) of the cross-validation errors generated by each fold.- cv.error.byfold:
The cross-validation error generated by each fold.

Additionally, returns the following object if `alloutput=TRUE`

:

- all.results:
A list containing the five above output components for each candidate matching estimator. Values for each candidate are appended as columns.

Kellogg, M., M. Mogstad, G. Pouliot, and A. Torgovitsky. Combining Matching and Synthetic Control to Trade off Biases from Extrapolation and Interpolation. Working Paper, 2019.

Other masc functions:
`cv_masc()`

,
`masc_by_phi()`

,
`sc_estimator()`

,
`solve_masc()`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | ```
##First example: a small, arbitrary dataset.
data3 <- data.frame(t = 1:10, treated = 1:10,
control1 = c(15,12,14,22,21,28,29,30,29,31),
control2 = c(10,5,10,12,16,20,22,21,23,21),
control3 = c(-7,-11,-7,-3,-4,-9,-7,-3,-7,-11),
control4 = c(-15,-13,-14,-16,-15,-12,-13,-13,-14,-14))
#controls are above stored as differences from treated unit. Translating into levels:
data3$control1 <- data3$control1 + data3$treated
data3$control2 <- data3$control2 + data3$treated
data3$control3 <- data3$control3 + data3$treated
data3$control4 <- data3$control4 + data3$treated
#defining treatment period:
treatperiod <- 6
result<-masc(treated=data3$treated,donors=data3[,-c(1,2)],treatment=treatperiod,
tune_pars_list=list(m=1:3,min_preperiods=3))
#an equivalent specification:
result<-masc(treated=data3$treated,donors=data3[,-c(1,2)],treatment=treatperiod,
tune_pars_list=list(m=1:3,set_f=3:4))
#Weights selected, for controls 1 through 4 respectively:
print(round(result$weights,2))
##Second example: Terrorism in the Basque Region, from
##Abadie and Gardeazabal (2003).
##Fitting estimators purely on outcome paths.
#First, load the Synth package, which includes the dataset:
if (requireNamespace("Synth",quietly=TRUE) & requireNamespace("data.table",quietly=TRUE)){
library(Synth)
library(data.table)
data(basque)
basque<-as.data.table(basque)
basque <- basque[regionno!=1,]
basque[,regionname:= gsub(" (.*)","",regionname)]
#Grabbing region names:
names<- c(unique(basque[regionno==17,regionname]),unique(basque[regionno!=17,regionname]))
basque <- cbind(basque[regionno==17,gdpcap],
t(reshape(basque[regionno!=17,.(regionno,year,gdpcap)],
idvar='regionno', timevar='year',direction='wide')[,-"regionno",with=FALSE]))
result <- masc(treated=basque[,1], donors=basque[,-1],treatment=16, tune_pars_list=list(m=1:10,
min_preperiods=8))
names(result$weights)<-names[-1]
#weights on control units:
print(round(result$weights,3))
#Treatment effects of terrorism on GDP per capita
#in thousands of 1986 US dollars, over 1970-1975:
#(first 6 years of treatment)
print(result$pred.error[1:6,])
#Selected tuning parameters?
print(paste0("Selected matching estimator: ",result$m_hat))
print(paste0("Selected weight on matching: ",result$phi_hat))
#Now, examine the shape of A) the CV error (mean square prediciton error in pre-period) and
# B) average prediction error (AKA treatment effect) over the first 5 treatment years,
#both over values of phi, fixing the matching estimator (moving from matching to synthetic controls)
phis<-seq(0,1,length.out=100)
phi_table<-masc_by_phi(treated=basque[,1], donors=basque[,-1],treatment=16, tune_pars=list(m=result$m_hat,
min_preperiods=8,phis=phis))
#Printing CV error and prediction error over values of phi. CV error is clearly lowest at intermediary values of phi,
#suggesting an estimator between matching and synthetic controls does best at forecasting. The average medium-run treatment
#effect is monotonically increasing as we move away from synthetic control and toward matching.
print(phi_table)
##Third example: Terrorism in the Basque Region, from
##Abadie and Gardeazabal (2003).
##Fitting estimators on outcome paths and covariates.
##data setup:
data(basque)
basque<-as.data.table(basque)
basque<-basque[regionno!=1,]
basque[,regionname:= gsub(" (.*)","",regionname)]
basque[,school.high:=school.high+school.post.high]
basque[,school.post.high:=NULL]
#Grabbing region names:
names<- c(unique(basque[regionno==17,regionname]),unique(basque[regionno!=17,regionname]))
#Setting up outcomes and covariates:
outcomes <- cbind(basque[regionno==17,gdpcap],
t(reshape(basque[regionno!=17,.(regionno,year,gdpcap)],
idvar='regionno', timevar='year',direction='wide')[,-"regionno",with=FALSE]))
#Abadie and Gardeazabal do not use the first 5 years of data (1955-1959):
outcomes <- outcomes[-(1:5),]
covariates <- copy(basque)
covariates <- covariates[year>=1960,]
covariates[,time:=year-min(year)+1]
covariates[,year:=NULL]
covariates[,regionno:=NULL]
#treating population density as fixed, assigning a common value to all years
#for each region
covariates[,popdens:=mean(popdens,na.rm=TRUE),by=regionname]
for(u in 1:length(names)){
covariates[regionname==names[u],unit:=u]
}
covariates[,regionname:=NULL]
#Reordering covariates, to match Abadie and Gardeazabal
covariates <- covariates[,c(8:11,13,1:7,12,14,15),with=FALSE]
#Solving for the MASC estimator. Note that the matching and SC functions used
#are NOT the default functions.
#This is because some transformations of the education-related covariates must
#be done to convert average levels over a given time period to shares.
#NOTE: the paper checks m=1:10, but that can take a while to run.
result <- masc(treated=outcomes[,1], donors=outcomes[,-1],
treated.covariates = covariates[unit==1,],
donors.covariates = covariates[unit!=1,],
treatment=11,
tune_pars_list=list(m=1,
min_preperiods=5))
}
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.