scdataMulti: Data Preparation for 'scest' or 'scpi' for Point Estimation...

View source: R/scdataMulti.R

scdataMultiR Documentation

Data Preparation for scest or scpi for Point Estimation and Inference Procedures Using Synthetic Control Methods.

Description

The command prepares the data to be used by scest or scpi to implement estimation and inference procedures for Synthetic Control (SC) methods in the general case of multiple treated units and staggered adoption. It is a generalization of scdata, since this latter prepares the data in the particular case of a single treated unit.

The names of the output matrices follow the terminology proposed in Cattaneo, Feng, Palomba and Titiunik (2022).

Companion Stata and Python packages are described in Cattaneo, Feng, Palomba, and Titiunik (2022).

Companion commands are: scdata for data preparation in the single treated unit case, scest for point estimation, scpi for inference procedures, scplot and scplotMulti for plots in the single and multiple treated unit(s) cases, respectively.

Related Stata, R, and Python packages useful for inference in SC designs are described in the following website:

https://nppackages.github.io/scpi/

For an introduction to synthetic control methods, see Abadie (2021) and references therein.

Usage

scdataMulti(
  df,
  id.var,
  time.var,
  outcome.var,
  treatment.var,
  features = NULL,
  cov.adj = NULL,
  cointegrated.data = FALSE,
  post.est = NULL,
  units.est = NULL,
  donors.est = NULL,
  anticipation = 0,
  effect = "unit-time",
  constant = FALSE,
  verbose = TRUE,
  sparse.matrices = FALSE
)

Arguments

df

a dataframe object.

id.var

a character with the name of the variable containing units' IDs. The ID variable can be numeric or character.

time.var

a character with the name of the time variable. The time variable has to be numeric, integer, or Date. In case time.var is Date it should be the output of as.Date() function. An integer or numeric time variable is suggested when working with yearly data, whereas for all other formats a Date type time variable is preferred.

outcome.var

a character with the name of the outcome variable. The outcome variable has to be numeric.

treatment.var

a character with the name of the variable containing the treatment assignment of each unit. The referenced variable has to take value 1 if the unit is treated in that period and value 0 otherwise. Please notice that, as common in the SC literature, we presume that once a unit is treated it remains treated forever. If treatment.var does not comply with this requirement the command would not work as expected!

features

a list containing the names of the feature variables used for estimation. If this option is not specified the default is features = outcome.var.

cov.adj

a list specifying the names of the covariates to be used for adjustment for each feature. If outcome.var is not in the variables specified in features, we force cov.adj<-NULL. See the Details section for more.

cointegrated.data

a logical that indicates if there is a belief that the data is cointegrated or not. The default value is FALSE.

post.est

a scalar specifying the number of post-treatment periods or a list specifying the periods for which treatment effects have to be computed for each treated unit.

units.est

a list specifying the treated units for which treatment effects have to be computed.

donors.est

a list specifying the donors units to be used. If the list has length 1, then all treated units share the same potential donors. Otherwise, if the user requires different donor pools for different treated units, the list must be of the same length of the number of treated units and each element has to be named with one treated unit's name as specified in id.var.

anticipation

a scalar that indicates the number of periods of potential anticipation effects. Default is 0.

effect

a string indicating the type of treatment effect to be computed. Options are: 'unit-time', which estimates treatment effects for each treated unit- post treatment period combination; 'unit', which estimates the treatment effect for each unit by averaging post-treatment features over time; 'time', which estimates the average treatment effect on the treated at various horizons.

constant

a logical which controls the inclusion of a constant term across features. The default value is FALSE.

verbose

if TRUE prints additional information in the console.

sparse.matrices

if TRUE all block diagonal matrices (\mathbf{B}, \mathbf{C}, and \mathbf{P}) are sparse matrices. This is suggested if the dimension of the dataset is large as it will likely reduce the execution time. The sparse matrices will be objects of class 'dgCMatrix' or 'lgCMatrix', thus to visualize them they need to be transformed in matrices, e.g. View(as.matrix(B)).

Details

  • Covariate-adjustment. See the Details section in scdata for further information on how to specify covariate-adjustment feature-by-feature.

  • Cointegration. cointegrated.data allows the user to model the belief that \mathbf{A} and \mathbf{B} form a cointegrated system. In practice, this implies that when dealing with the pseudo-true residuals \mathbf{u}, the first-difference of \mathbf{B} are used rather than the levels.

  • Effect. effect allows the user to select between two causal quantities. The default option, effect = "unit-time", prepares the data for estimation of

    \tau_{ik},\quad k\geq, i=1,\ldots,N_1,

    whereas the option effect = "unit" prepares the data for estimation of

    \tau_{\cdot k}=\frac{1}{N_1} \sum_{i=1}^{N_1} \tau_{i k}

    which is the average effect on the treated unit across multiple post-treatment periods.

Value

The command returns an object of class 'scdataMulti' containing the following

A

a matrix containing pre-treatment features of the treated units.

B

a matrix containing pre-treatment features of the control units.

C

a matrix containing covariates for adjustment.

P

a matrix whose rows are the vectors used to predict the out-of-sample series for the synthetic units.

P.diff

for internal use only.

Y.df

a dataframe containing the outcome variable for all units.

Y.pre

a matrix containing the pre-treatment outcome of the treated units.

Y.post

a matrix containing the post-treatment outcome of the treated units.

Y.donors

a matrix containing the pre-treatment outcome of the control units.

specs

a list containing some specifics of the data:

  • J, a list containing the number of donors for each treated unit

  • K, a list containing the number of covariates used for adjustment for each feature for each treated unit

  • KM, a list containing the total number of covariates used for adjustment for each treated unit

  • M, a list containing number of features used for each treated unit

  • I, number of treated units

  • KMI, overall number of covariates used for adjustment

  • period.pre, a list containing a numeric vector with the pre-treatment period for each treated unit

  • period.post, a list containing a numeric vector with the post-treatment period for each treated unit

  • T0.features, a list containing a numeric vector with the number of periods used in estimation for each feature for each treated unit

  • T1.outcome, a list containing the number of post-treatment periods for each treated unit

  • features.list, a list containing the name of the features for each treated unit

  • outcome.var, a character containing the name of the outcome variable

  • constant, for internal use only

  • effect, for internal use only

  • anticipation, number of periods of potential anticipation effects

  • out.in.features, for internal use only

  • sparse.matrices, for internal use only

  • treated.units, list containing the IDs of all treated units

  • donors.list, list containing the IDs of the donors of each treated unit

Author(s)

Matias Cattaneo, Princeton University. cattaneo@princeton.edu.

Yingjie Feng, Tsinghua University. fengyj@sem.tsinghua.edu.cn.

Filippo Palomba, Princeton University (maintainer). fpalomba@princeton.edu.

Rocio Titiunik, Princeton University. titiunik@princeton.edu.

References

See Also

scdata, scest, scpi, scplot, scplotMulti

Examples


datager <- scpi_germany

datager$tr_id <- 0
datager$tr_id[(datager$country == "West Germany" & datager$year > 1990)] <- 1
datager$tr_id[(datager$country == "Italy" & datager$year > 1992)] <- 0

outcome.var <- "gdp"
id.var <- "country"
treatment.var <- "tr_id"
time.var <- "year"
df.unit <- scdataMulti(datager, id.var = id.var, outcome.var = outcome.var,
                       treatment.var = treatment.var,
                       time.var = time.var, features = list(c("gdp", "trade")),
               		    cointegrated.data = TRUE, constant = TRUE)


scpi documentation built on Nov. 2, 2023, 5:41 p.m.