chen_lee | R Documentation |
Given the number of observations and number of endogeneous variables, create an outcome variable defined by a location scale model where the coefficients on the endogenous variables are supposed to be 1.
chen_lee(n = 500, p_D = 3, beta_D_errors = NULL)
n |
Number of observations; defaults to 500 (numeric) |
p_D |
Number of endogeneous variables; defaults to 3 (numeric) |
beta_D_errors |
Coefficients on the error terms, one for each endogeneous variable (vector of length p_D); If NULL, defaults to the values in Chen and Lee (2018) |
The error term in the location scale model that underpins this simulation design is defined in terms of the endogeneous variables (which is why we call these variables "endogenous"). To properly estimate the coefficients on the endogeneous variables, we require instruments that are uncorrelated with the errors, related to the endogeneous variables, and only related to the outcome variable through their association with these endogeneous variables.
This function creates errors, endogeneous variables, instruments, and an outcome variable such that the above terms are satisfied. The errors are drawn independently of the instruments from a multivariate normal distribution. The instruments are drawn from a standard normal normal distribution. The endogeneous variables are multiples of the cumulative distribution function of the shocked instruments. The error in the true model for the outcome variable is defined in terms of the endogeneous variables.
The original Chen and Lee simulation design used 3 endogeneous variables. This design allows for an arbitrary number of endogeneous variables. To allow fewer endogeneous variables, say 2 endogeneous variables, we simply omit the third endogeneous variable from the original Chen and Lee simulation before constructing our outcome variable.
The strength of identification is determined in two ways.
First: the covariance between the errors on the location scale model and the
shocks to the instruments when defining D. This is given by the off-diagonal
entries of V
.
Second: the coefficients on the interaction between each endogeneous
variable and the errors on the location scale model. The closer these
coefficients are to 0, the less endogeneity we have and the stronger our
identification is. See error_coefs
argument.
A named list:
Y: outcome variable (n by 1 matrix)
D: endogeneous variable (n by p_D matrix)
Z: instruments (n by p_D matrix)
X: matrix of 1's (n by 1 matrix)
errors: matrix of errors and shocks (n by (p_D + 1) matrix); first column is the vector of errors on the location scale model; all other columns are shocks to the instruments when defining D.
V: variance-covariance matrix of the errors/shocks
beta_D_errors: coefficients on the interaction between each endogeneous variable and the erros on the location scale model
true_chen_lee
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.