chen_lee: Simulate data according to Chen and Lee (2017)

View source: R/chen_lee.R

chen_leeR Documentation

Simulate data according to Chen and Lee (2017)

Description

Given the number of observations and number of endogeneous variables, create an outcome variable defined by a location scale model where the coefficients on the endogenous variables are supposed to be 1.

Usage

chen_lee(n = 500, p_D = 3, beta_D_errors = NULL)

Arguments

n

Number of observations; defaults to 500 (numeric)

p_D

Number of endogeneous variables; defaults to 3 (numeric)

beta_D_errors

Coefficients on the error terms, one for each endogeneous variable (vector of length p_D); If NULL, defaults to the values in Chen and Lee (2018)

Details

The error term in the location scale model that underpins this simulation design is defined in terms of the endogeneous variables (which is why we call these variables "endogenous"). To properly estimate the coefficients on the endogeneous variables, we require instruments that are uncorrelated with the errors, related to the endogeneous variables, and only related to the outcome variable through their association with these endogeneous variables.

This function creates errors, endogeneous variables, instruments, and an outcome variable such that the above terms are satisfied. The errors are drawn independently of the instruments from a multivariate normal distribution. The instruments are drawn from a standard normal normal distribution. The endogeneous variables are multiples of the cumulative distribution function of the shocked instruments. The error in the true model for the outcome variable is defined in terms of the endogeneous variables.

The original Chen and Lee simulation design used 3 endogeneous variables. This design allows for an arbitrary number of endogeneous variables. To allow fewer endogeneous variables, say 2 endogeneous variables, we simply omit the third endogeneous variable from the original Chen and Lee simulation before constructing our outcome variable.

The strength of identification is determined in two ways. First: the covariance between the errors on the location scale model and the shocks to the instruments when defining D. This is given by the off-diagonal entries of V. Second: the coefficients on the interaction between each endogeneous variable and the errors on the location scale model. The closer these coefficients are to 0, the less endogeneity we have and the stronger our identification is. See error_coefs argument.

Value

A named list:

  1. Y: outcome variable (n by 1 matrix)

  2. D: endogeneous variable (n by p_D matrix)

  3. Z: instruments (n by p_D matrix)

  4. X: matrix of 1's (n by 1 matrix)

  5. errors: matrix of errors and shocks (n by (p_D + 1) matrix); first column is the vector of errors on the location scale model; all other columns are shocks to the instruments when defining D.

  6. V: variance-covariance matrix of the errors/shocks

  7. beta_D_errors: coefficients on the interaction between each endogeneous variable and the erros on the location scale model

See Also

true_chen_lee


omkarakatta/ivqr documentation built on Aug. 20, 2022, 11:04 p.m.