scdataMulti: Data Preparation for 'scest' or 'scpi' for Point Estimation...
In scpi: Prediction Intervals for Synthetic Control Methods with Multiple Treated Units and Staggered Adoption

scdataMulti

R Documentation

Data Preparation for `scest` or `scpi` for Point Estimation and Inference Procedures Using Synthetic Control Methods.

Description

The command prepares the data to be used by scest or scpi to implement estimation and inference procedures for Synthetic Control (SC) methods in the general case of multiple treated units and staggered adoption. It is a generalization of scdata, since this latter prepares the data in the particular case of a single treated unit.

The names of the output matrices follow the terminology proposed in Cattaneo, Feng, Palomba and Titiunik (2022).

Companion Stata and Python packages are described in Cattaneo, Feng, Palomba, and Titiunik (2022).

Companion commands are: scdata for data preparation in the single treated unit case, scest for point estimation, scpi for inference procedures, scplot and scplotMulti for plots in the single and multiple treated unit(s) cases, respectively.

Related Stata, R, and Python packages useful for inference in SC designs are described in the following website:

https://nppackages.github.io/scpi/

For an introduction to synthetic control methods, see Abadie (2021) and references therein.

Usage

scdataMulti(
  df,
  id.var,
  time.var,
  outcome.var,
  treatment.var,
  features = NULL,
  cov.adj = NULL,
  cointegrated.data = FALSE,
  post.est = NULL,
  units.est = NULL,
  donors.est = NULL,
  anticipation = 0,
  effect = "unit-time",
  constant = FALSE,
  verbose = TRUE,
  sparse.matrices = FALSE
)

Arguments

`df`	a dataframe object.
`id.var`	a character with the name of the variable containing units' IDs. The ID variable can be numeric or character.
`time.var`	a character with the name of the time variable. The time variable has to be numeric, integer, or Date. In case `time.var` is Date it should be the output of `as.Date()` function. An integer or numeric time variable is suggested when working with yearly data, whereas for all other formats a Date type time variable is preferred.
`outcome.var`	a character with the name of the outcome variable. The outcome variable has to be numeric.
`treatment.var`	a character with the name of the variable containing the treatment assignment of each unit. The referenced variable has to take value 1 if the unit is treated in that period and value 0 otherwise. Please notice that, as common in the SC literature, we presume that once a unit is treated it remains treated forever. If treatment.var does not comply with this requirement the command would not work as expected!
`features`	a list containing the names of the feature variables used for estimation. If this option is not specified the default is `features = outcome.var`.
`cov.adj`	a list specifying the names of the covariates to be used for adjustment for each feature. If `outcome.var` is not in the variables specified in `features`, we force `cov.adj<-NULL`. See the Details section for more.
`cointegrated.data`	a logical that indicates if there is a belief that the data is cointegrated or not. The default value is `FALSE`.
`post.est`	a scalar specifying the number of post-treatment periods or a list specifying the periods for which treatment effects have to be computed for each treated unit. It is only effective when effect = "unit-time".
`units.est`	a list specifying the treated units for which treatment effects have to be computed.
`donors.est`	a list specifying the donors units to be used. If the list has length 1, then all treated units share the same potential donors. Otherwise, if the user requires different donor pools for different treated units, the list must be of the same length of the number of treated units and each element has to be named with one treated unit's name as specified in id.var.
`anticipation`	a scalar that indicates the number of periods of potential anticipation effects. Default is 0.
`effect`	a string indicating the type of treatment effect to be computed. Options are: 'unit-time', which estimates treatment effects for each treated unit- post treatment period combination; 'unit', which estimates the treatment effect for each unit by averaging post-treatment features over time; 'time', which estimates the average treatment effect on the treated at various horizons.
`constant`	a logical which controls the inclusion of a constant term across features. The default value is `FALSE`.
`verbose`	if `TRUE` prints additional information in the console.
`sparse.matrices`	if `TRUE` all block diagonal matrices (`\mathbf{B}`, `\mathbf{C}`, and `\mathbf{P}`) are sparse matrices. This is suggested if the dimension of the dataset is large as it will likely reduce the execution time. The sparse matrices will be objects of class 'dgCMatrix' or 'lgCMatrix', thus to visualize them they need to be transformed in matrices, e.g. `View(as.matrix(B))`.

Details

Covariate-adjustment. See the Details section in scdata for further information on how to specify covariate-adjustment feature-by-feature.
Cointegration. cointegrated.data allows the user to model the belief that \mathbf{A} and \mathbf{B} form a cointegrated system. In practice, this implies that when dealing with the pseudo-true residuals \mathbf{u}, the first-difference of \mathbf{B} are used rather than the levels.
Effect. effect allows the user to select between two causal quantities. The default option, effect = "unit-time", prepares the data for estimation of

\tau_{ik},\quad k\geq, i=1,\ldots,N_1,

whereas the option effect = "unit" prepares the data for estimation of

\tau_{\cdot k}=\frac{1}{N_1} \sum_{i=1}^{N_1} \tau_{i k}

which is the average effect on the treated unit across multiple post-treatment periods.

Value

The command returns an object of class 'scdataMulti' containing the following

`A`	a matrix containing pre-treatment features of the treated units.
`B`	a matrix containing pre-treatment features of the control units.
`C`	a matrix containing covariates for adjustment.
`P`	a matrix whose rows are the vectors used to predict the out-of-sample series for the synthetic units.
`P.diff`	for internal use only.
`Y.df`	a dataframe containing the outcome variable for all units.
`Y.pre`	a matrix containing the pre-treatment outcome of the treated units.
`Y.post`	a matrix containing the post-treatment outcome of the treated units.
`Y.donors`	a matrix containing the pre-treatment outcome of the control units.
`specs`	a list containing some specifics of the data: `J`, a list containing the number of donors for each treated unit `K`, a list containing the number of covariates used for adjustment for each feature for each treated unit `KM`, a list containing the total number of covariates used for adjustment for each treated unit `M`, a list containing number of features used for each treated unit `I`, number of treated units `KMI`, overall number of covariates used for adjustment `period.pre`, a list containing a numeric vector with the pre-treatment period for each treated unit `period.post`, a list containing a numeric vector with the post-treatment period for each treated unit `T0.features`, a list containing a numeric vector with the number of periods used in estimation for each feature for each treated unit `T1.outcome`, a list containing the number of post-treatment periods for each treated unit `features.list`, a list containing the name of the features for each treated unit `outcome.var`, a character containing the name of the outcome variable `constant`, for internal use only `effect`, for internal use only `anticipation`, number of periods of potential anticipation effects `out.in.features`, for internal use only `sparse.matrices`, for internal use only `treated.units`, list containing the IDs of all treated units `donors.list`, list containing the IDs of the donors of each treated unit

Author(s)

Matias Cattaneo, Princeton University. cattaneo@princeton.edu.

Yingjie Feng, Tsinghua University. fengyj@sem.tsinghua.edu.cn.

Filippo Palomba, Princeton University (maintainer). fpalomba@princeton.edu.

Rocio Titiunik, Princeton University. titiunik@princeton.edu.

References

Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2), 391-425.
Cattaneo, M. D., Feng, Y., and Titiunik, R. (2021). Prediction intervals for synthetic control methods. Journal of the American Statistical Association, 116(536), 1865-1880.
Cattaneo, M. D., Feng, Y., Palomba F., and Titiunik, R. (2022). scpi: Uncertainty Quantification for Synthetic Control Methods, arXiv:2202.05984.
Cattaneo, M. D., Feng, Y., Palomba F., and Titiunik, R. (2022). Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption, arXiv:2210.05026.

Examples


datager <- scpi_germany

datager$tr_id <- 0
datager$tr_id[(datager$country == "West Germany" & datager$year > 1990)] <- 1
datager$tr_id[(datager$country == "Italy" & datager$year > 1992)] <- 0

outcome.var <- "gdp"
id.var <- "country"
treatment.var <- "tr_id"
time.var <- "year"
df.unit <- scdataMulti(datager, id.var = id.var, outcome.var = outcome.var,
                       treatment.var = treatment.var,
                       time.var = time.var, features = list(c("gdp", "trade")),
               		    cointegrated.data = TRUE, constant = TRUE)

scpi documentation built on April 3, 2025, 8:54 p.m.

scpi index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

scpi
Prediction Intervals for Synthetic Control Methods with Multiple Treated Units and Staggered Adoption

scdataMulti: Data Preparation for 'scest' or 'scpi' for Point Estimation...
In scpi: Prediction Intervals for Synthetic Control Methods with Multiple Treated Units and Staggered Adoption

Data Preparation for `scest` or `scpi` for Point Estimation and Inference Procedures Using Synthetic Control Methods.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to scdataMulti in scpi...

R Package Documentation

Browse R Packages

We want your feedback!

scpi Prediction Intervals for Synthetic Control Methods with Multiple Treated Units and Staggered Adoption

scdataMulti: Data Preparation for 'scest' or 'scpi' for Point Estimation... In scpi: Prediction Intervals for Synthetic Control Methods with Multiple Treated Units and Staggered Adoption

Data Preparation for scest or scpi for Point Estimation and Inference Procedures Using Synthetic Control Methods.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to scdataMulti in scpi...

R Package Documentation

Browse R Packages

We want your feedback!

scpi
Prediction Intervals for Synthetic Control Methods with Multiple Treated Units and Staggered Adoption

scdataMulti: Data Preparation for 'scest' or 'scpi' for Point Estimation...
In scpi: Prediction Intervals for Synthetic Control Methods with Multiple Treated Units and Staggered Adoption

Data Preparation for `scest` or `scpi` for Point Estimation and Inference Procedures Using Synthetic Control Methods.