tw_data: Generate One-way and Two-way Fixed Effects Panel Data

View source: R/twsim2.R

tw_dataR Documentation

Generate One-way and Two-way Fixed Effects Panel Data

Description

This function will produce panel data where variation can exist in the cross-section, over time or in both dimensions simultaneously. Furthermore, effect heterogeneity by case or cross section is also allowed, along with interactive effects such as differences-in-differences.

Usage

tw_data(
  N = 30,
  T = 30,
  case.int.mean = 0.5,
  case.int.sd = 0.5,
  cross.int.mean = -1,
  cross.int.sd = 0.5,
  cross.eff.mean = 0,
  did.eff.mean = 1,
  did.eff.sd = 0.25,
  wid.eff.mean = 0,
  wid.eff.sd = 0,
  cross.eff.sd = 0.5,
  case.eff.mean = 0.5,
  case.eff.sd = 0.5,
  noise.sd = 1,
  omm.x.case = 0,
  omm.x.cross = 0,
  omm.y.case = 0,
  omm.y.cross = 0,
  treat_effect = NULL,
  this_case = NULL,
  this_time = NULL,
  binary_outcome = FALSE,
  binary_x = FALSE,
  unbalance = FALSE,
  time.ac = 0,
  spatial.ac = 0,
  prior_true_vals = NULL
)

Arguments

N

The number of observations for each case/unit.

T

The number of time points per observation.

case.int.mean

The mean of the case/unit intercepts/fixed effects

case.int.sd

The SD of the case/unit intercepts/fixed effects

cross.int.mean

The mean of the cross-sectional intercepts/fixed effects

cross.int.sd

The SD of the cross-sectional intercepts/fixed effects

cross.eff.mean

The mean of the cross-sectional effect of X on Y

did.eff.mean

The mean of the difference-in-difference effect of X on Y

did.eff.sd

The SD of the difference-in-difference effect of X on Y

wid.eff.mean

The mean of the difference-in-cases effect of X on Y

wid.eff.sd

The SD of the difference-in-cases effect of X on Y

cross.eff.sd

The SD of the cross-sectional effect of X on Y

case.eff.mean

The mean of the case (over-time) effect of X on Y

case.eff.sd

The SD of the case (over-time) effect of X on Y

noise.sd

The residual variance of the data

omm.x.case

The value of an omitted variable correlated with X that varies across cases/units

omm.x.cross

The value of an omitted variable correlated with X that varies cross-sectionally

omm.y.case

The value of an omitted variable correlated with Y that varies across cases/units

omm.y.cross

The value of an omitted variable correlated with Y that varies cross-sectionally

treat_effect

A vector of length 1 or N*T that is equal to 1 for assignment to treatment and 0 for assignment to control. If this argument is not NULL, 'tw_data' will generate data for y that takes x as fixed to these values (i.e., treatment is being assigned/manipulated).

binary_outcome

Whether the Y (outcome) variable should be converted to 0/1. Note that this can lead to some measurement bias as Y is simulated as continuous.

binary_x

Whether X should be converted to 0/1. Note that doing so may lead to some bias in estimation as X is simulated as continuous.

unbalance

Whether to simulate varying numbers of observations by cases or time points.

time.ac

A value between 0 and 1 giving the over-time autocorrelation in effect of X on Y

spatial.ac

A value between 0 and 1 giving the cross-sectional (spatial) autocorrelation in the effect of X on Y

prior_true_vals

A fitted 'tw_data' object with generated coefficients that can be used to keep vectors of intercepts/effects fixed over repeated sampling

Details

The tw_data function is the workhorse of the twowaysim package. It accepts as input the dimensions of the panel/TSCS data to be generated, and also parameters that determine the extent of variance and heterogeneity in either the cross-sectional or over-time effects in the data or the interaction thereof. The parameter N determines how many observations exist for each case or unit in the panel, while T determines how many time points exist per case or unit. To create a model with a homogenous (static) within-unit over-time (case) effect, simply set case.eff.mean to a non-zero number and set case.eff.sd to zero. Similarly, setting cross.eff.mean to a non-zero number and cross.eff.sd to zero will produce a panel dataset with a cross-sectional effect of X on Y where the effect of X does not vary across countries (no effect heterogeneity). Increasing cross.eff.sd and case.eff.sd will result in more effect heterogeneity across countries and time points. If both case.eff.mean and cross.eff.mean are non-zero, then Y will have both dimensions of variance. A 1-way fixed effects model with intercepts on cases will return the case.eff.mean coefficient and a model with intercepts on time points will return the cross.eff.mean estimate, whereas a 2-way model (intercepts on cases and time points) will return a difficult-to-characterize weighted average.

To estimate a difference-in-differences (time interaction) effect, set did.eff.mean to a non-zero number and set did.eff.sd to zero if the DiD effects are supposed to be homogenous. Note that to estimate a standard "canonical" DiD setup, the time points T should be no more than 2. For more information about generating DiD specifications with 'tw_data', see the blog post https://www.robertkubinec.com/post/did_dnd/index.html.

We refer you to Kropko and Kubinec (2020) for more information on the difference between these models: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231349.

The parameters case.int and cross.int represent the values of the intercepts for the cases or time points. Changing these parameters will increase or decrease the amount of unexplained variance (random noise) in the dataset.

The additional parameters in the function allow the user to create unbalanced panels (varying numbers of observations per case or time point if unbalance=TRUE), auto-correlation in the effects and omitted variables. Autocorrelation can exist either in the over time dimension or the cross-sectional dimension. To increase time autocorrelation, set time.ac to a value between 0 and 1 where values closer to one signal higher autocorrelation. To increase spatial (cross-sectional) autocorrelation, set spatial.ac to a value between 0 and 1.

Finally, to include omitted variables, set one of the omm parameters to a non-zero value. Omitted variables that vary within cases (over time) can be included by setting an omm parameter subscripted with case to a non-zero value, and the same is possible for variables that vary in the cross-section cross. The analyst can also decide whether the omitted variable is correlated with the independent variable of interest x or the dependent variable y by choosing the subscript of omm.

Value

The function returns a named list where object$data is a data.frame and object$pars are the original parametes used to generate the data. The value of generated coefficients is returned in a list as object$fixed_params. For repeated sampling, the object can be given to the 'prior_true_vals' argument to allow for fixed population parameters for intercepts and vectors of effects.

See Also

tw_model for running linear models on the data and and tw_sim function for running Monte Carlo simulations on panel data.

Examples


# case (over-time) effect with no effect heterogeneity

case1 <- tw_data(case.eff.mean=-1,case.eff.sd=0)

# case (over-time) effect with substantial effect heterogeneity across countries

case2 <- tw_data(case.eff.mean=-1,case.eff.sd=1)

# cross-section effect with no effect heterogeneity

cross1 <- tw_data(cross.eff.mean=-1,cross.eff.sd=0)

# cross-section effect with substantial effect heterogeneity across countries

cross2 <- tw_data(cross.eff.mean=-1,cross.eff.sd=1)

# panel data with a cross-sectional effect of 3 and a case (over-time) effect of -1

both_case_cross <- tw_data(cross.eff.mean=3,
                             case.eff.mean=-1)


saudiwin/panelsim documentation built on July 5, 2025, 2:19 a.m.