copulaEndo: Fitting Linear Models Endogeneous Regressors using Gaussian...

Description Usage Arguments Details Value Author(s) References Examples

Description

Fits linear models with continuous or discrete endogeneous regressors using Gaussian copulas, method presented in Park and Gupta (2012). This is a statistical technique to address the endogeneity problem, where no external instrumental variables are needed. The important assumption of the model is that the endogeneous variables should NOT be normally distributed.

Usage

1
2
copulaEndo(y, X, P, param = NULL, type = NULL, method = NULL,
  intercept = NULL)

Arguments

y

the vector or matrix containing the dependent variable.

X

the data frame or matrix containing the regressors of the model, both exogeneous and endogeneous. The last column/s should contain the endogenous variable/s.

P

the matrix.vector containing the endogenous variables.

param

the vector of initial values for the parameters of the model to be supplied to the optimization algorithm. The parameters to be estimated are theta = {b,a,rho,sigma}, where b are the parameters of the exogenous variables, a is the parameter of the endogenous variable, rho is the parameter for the correlation between the error and the endogenous regressor, while sigma is the standard deviation of the structural error.

type

the type of the endogenous regressor/s. It can take two values, "continuous" or "discrete".

method

the method used for estimating the model. It can take two values, 1 or 2, where 1 is the maximum likelihood approach described in Park and Gupta (2012), while 2 is the equivalent OLS approach described in the same paper. Method one can be applied only when there is just a single, continous endogenous variable. When there are more than one continuous endogenous regressors, or they are discrete, the second method is being applied by default.

intercept

optional parameter. The model is estimated by default with intercept. If no intercept is desired or the regressors matrix X contains already a column of ones, intercept should be given the value "no".

Details

Park and Gupta (2012) proposed a method that allows for the joint estimation of the endogenous regressor and the error term in the structural equation using copulas. As LIV, the model parameters are estimated using maximum likelihood. The underlying idea is that, using information contained in the observed data, one selects marginal distributions for the endogenous regressor and the structural error. Then, the copula model enables the construction of a flexible multivariate joint distribution allowing a wide range of correlations between the two marginals. For the error, epsilon_t, the marginal distribution is assumed to be normal, while the marginal distribution of the endogenous regressor, P_t is obtained using the Epanechnikov kernel density estimator with the bandwidth equal to

b = 0.9 * T^(-1/5) * min(s,IQR/1.34)

, where IQR is the inter-quartile range while s is the data sample standard deviation. Following Sklar's theorem (Sklar, 1959), where H(P), G(epsilon) are the marginal distributions of the endogenous regressors and of the structural error, respectively, there exists a copula function C such that for all p and epsilon, the joint distribution function is:

F(p, epsilon) = C(H(p).G(epsilon)) = C(U_p, U_epsilon)

where U_p = H(p), U_epsilon = G(epsilon) are uniform (0,1) random variables. Then the joint density function is given by:

f(p, epsilon) = c(U_p, U_epsion)h(p)g(epsilon)

where

c(U_p, U_epsilon) = d^2(C)/d(p)d(epsilon)

. Using the Gaussian copula, the joint density function of P_{t}P_t and epsilon_t is:

f(p_{t},ε_{t}) =\frac{1}{(1-ρ^{2})^{1/2}}exp≤ft[\frac{-ρ^{2}(Φ^{-1}(U_{p,t})^{2}+Φ^{-1}(U_{ε,t})^{2})}{2(1-ρ^{2})}+\frac{ρΦ^{-1}(U_{p,t})Φ^{-1}(U_{ε,t})}{(1-ρ^{2})}\right]\\ \cdot h(p)\cdot g(ε_{t})

f(p_t,epsilon_t) = 1/sqrt(1- rho^2)exp[(-rho^2)*(Phi^(-1)((U_p)^2) + Phi^(-1)(U_epsilon^2))/(2*(1-rho^2)) + rho * Phi^(-1)(U_p) * Phi^(-1)(U_epsilon)/(1-rho^2)] Having the joint density function, the model's parameters Θ=\{α,β,σ_{ε},ρ \} are obtained by maximising the log-likelihood function. The maximum likelihood estimation is performed by the "BFGS" algorithm. When there are two endogenous regressors, there is no need for initial parameters since the method applied is by default the augmented OLS, which can be specified by using method two - "method=2.

Value

Depending on the method and the type of the variables, it returns the optimal values of the parameters and their standard errors in the case of the second method. With one endogenous variable, if the maximum likelihood approach is chosen, the standard errors can be computed by bootsptrapping using the boots function from the same package.

Author(s)

The implementation of the model by Raluca Gui based on the paper of Park and Gupta (2012).

References

Park, S. and Gupta, S., (2012), 'Handling Endogeneous Regressors by Joint Estimation Using Copulas', Marketing Science, 31(4), 567-86.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#load dataset dataCopC1, where P is endogenous, continuous and not normally distributed
data(dataCopC1)
y <- dataCopC1[,1]
X <- dataCopC1[,2:5]
P <- dataCopC1[,5]
c1 <- copulaEndo(y, X, P, type = "continuous", method = 1, intercept="no")
c1
# to obtain the standard errors use the boots() function
# se.c1 <- boots(10, y, X, P, param = c(1,1,-2,-0.5,0.2,1), intercept= "no")

# an alternative model can be obtained using "method = 2".
c12 <- copulaEndo(y, X, P, type = "continuous", method = 2)
c1

# load datset with 2 continuous, non-normally distributed endogeneous regressors.
# with 2 endogenous regressors the default method is the augmented OLS.
#data(dataCopC2)
#y <- dataCopC2[,1]
#X <- dataCopC2[,2:6]
#P <- dataCopC2[,5:6]
#c2 <- copulaEndo(y, X, P, type = "continuous")
#summary(c2)

# load dataset with 1 discrete endogeneous variable. 
# having more than 1 discrete endogenous regressor is also possible
data(dataCopDis)
y <- dataCopDis[,1]
X <- dataCopDis[,2:5]
P <- dataCopDis[,5]
c3 <- copulaEndo(y, X, P, type = "discrete", intercept=FALSE)
c3

Rgui/REndo_1.0 documentation built on May 9, 2019, 10:03 a.m.