generate_predictor: generate_predictor
In tidysynth: A Tidy Implementation of the Synthetic Control Method

generate_predictor

R Documentation

generate_predictor

Description

Create one or more scalar variables summarizing covariate data across a specified time window. These predictor variables are used to fit the synthetic control.

Usage

generate_predictor(data, time_window = NULL, ...)

Arguments

`data`	nested data of type `tbl_df` generated from `synthetic_control()`. See `synthetic_control()` documentation for more information.
`time_window`	set time window from the pre-intervention period that the data should be aggregated across to generate the specific predictor. Default is to use the entire pre-intervention period.
`...`	Name-value pairs of summary functions. The name will be the name of the variable in the result. The value should be an expression that returns a single value like min(x), n(), or sum(is.na(y)). Note that for all summary functions `na.rm = TRUE` argument should be specified as aggregating across units with missing values is a common occurrence.

Details

matrices of aggregate-level covariates to be used in the following minimization task.

W^*(V) = min \sum^M_{m=1} v_m (X_{1m} - \sum^{J+1}_{j=2}w_j X_{jm})^2

The importance of the generate predictors are determine by vector V, and the weights that determine unit-level importance are determined by vector W. The nested optimation task seeks to find optimal values of V and W. Note also that V can be provided by the user. See ?generate_weights().

Value

tbl_df with nested fields containing the following:

.id: unit id for the intervention case (this will differ when a placebo unit).
.placebo: indicator field taking on the value of 1 if a unit is a placebo unit, 0 if it's the specified treated unit.
.type: type of the nested data construct: treated or controls. Keeps tract of which data construct is located in .outcome field.
.outcome: nested data construct containing the outcome variable configured for the sythnetic control method. Data is configured into a wide format for the optimization task.
.predictors: nested data construct containing the covariate matrices for the treated and control (donor) units. Data is configured into a wide format for the optimization task.
.original_data: original impute data filtered by treated or control units. This allows for easy processing down stream when generating predictors.
.meta: stores information regarding the unit and time index, the treated unit and time and the name of the outcome variable. Used downstream in subsequent functions.

Examples




# Smoking example data
data(smoking)

smoking_out <-
smoking %>%

# initial the synthetic control object
synthetic_control(outcome = cigsale,
                  unit = state,
                  time = year,
                  i_unit = "California",
                  i_time = 1988,
                  generate_placebos= FALSE) %>%

# Generate the aggregate predictors used to generate the weights
  generate_predictor(time_window=1980:1988,
                     lnincome = mean(lnincome, na.rm = TRUE),
                     retprice = mean(retprice, na.rm = TRUE),
                     age15to24 = mean(age15to24, na.rm = TRUE))

# Extract respective predictor matrices
smoking_out %>% grab_predictors(type = "treated")
smoking_out %>% grab_predictors(type = "controls")

tidysynth documentation built on April 3, 2025, 5:32 p.m.