README.md

GJRM: Generalised Joint Regression Modelling

Authors: Giampiero Marra, Rosalba Radice License: GPL (>= 2)

CRAN_Status_Badge cran checks Total Downloads Dependencies Top language Repo size Last commit License

The GJRM package offers numerous routines for fitting various joint (and univariate) regression models, with several types of covariate effects, in the presence of equations' errors association, endogeneity, non-random sample selection or partial observability.

Installation

You can install the GJRM package from CRAN:

install.packages("GJRM")

Joint models with binary margins: the gjrm function

Let us consider a simple example that shows how to use the gjrm function. First, we generate the data from a joint model with binary margins.

library(GJRM)

set.seed(0)
n     <- 400
Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1
u     <- rMVN(n, rep(0,2), Sigma)
x1    <- round(runif(n)); x2 <- runif(n); x3 <- runif(n)
f1    <- function(x) cos(pi*2*x) + sin(pi*x)
f2    <- function(x) x+exp(-30*(x-0.5)^2)
y1    <- ifelse(-1.55 + 2*x1 + f1(x2) + u[,1] > 0, 1, 0)
y2    <- ifelse(-0.25 - 1.25*x1 + f2(x2) + u[,2] > 0, 1, 0)
dataSim <- data.frame(y1, y2, x1, x2, x3)

Classic bivariate probit model

We can use the gjrm function to estimate a classic bivariate probit model as follows.

out <- gjrm(list(y1 ~ x1 + x2 + x3, 
                 y2 ~ x1 + x2 + x3),
            data = dataSim,
            margins = c("probit", "probit"),
            Model = "B")

The convergence status can be examined through the conv.check function.

conv.check(out)
#> 
#> Largest absolute gradient value: 5.030004e-14
#> Observed information matrix is positive definite
#> Eigenvalue range: [9.528687,419.9106]
#> 
#> Trust region iterations: 3

An overview of the parameter estimates is available through the summary method.

summary(out)
#> 
#> COPULA:   Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#> 
#> EQUATION 1
#> Link function for mu.1: probit 
#> Formula: y1 ~ x1 + x2 + x3
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -0.94936    0.20532  -4.624 3.77e-06 ***
#> x1           1.97383    0.15150  13.028  < 2e-16 ***
#> x2           0.20574    0.25474   0.808    0.419    
#> x3          -0.06393    0.25939  -0.246    0.805    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> EQUATION 2
#> Link function for mu.2: probit 
#> Formula: y2 ~ x1 + x2 + x3
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.04474    0.17736   0.252  0.80084    
#> x1          -1.07918    0.13393  -8.058 7.76e-16 ***
#> x2           0.66869    0.23016   2.905  0.00367 ** 
#> x3           0.08599    0.23228   0.370  0.71124    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> n = 400  theta = 0.282(0.0492,0.45)  tau = 0.182(0.0313,0.297)
#> total edf = 9

Information criteria are readily calculated using the AIC and BIC functions.

AIC(out)
#> [1] 840.3036
BIC(out)
#> [1] 876.2268

Bivariate probit model with splines

A bivariate probit model with splines can be estimated in the following way.

out <- gjrm(list(y1 ~ x1 + s(x2) + s(x3),
                 y2 ~ x1 + s(x2) + s(x3)),
            data = dataSim,
            margins = c("probit", "probit"),
            Model = "B")
conv.check(out)
#> 
#> Largest absolute gradient value: 1.308371e-09
#> Observed information matrix is positive definite
#> Eigenvalue range: [0.2366899,5.216987e+13]
#> 
#> Trust region iterations before smoothing parameter estimation: 4
#> Loops for smoothing parameter estimation: 3
#> Trust region iterations within smoothing loops: 4
summary(out)
#> 
#> COPULA:   Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#> 
#> EQUATION 1
#> Link function for mu.1: probit 
#> Formula: y1 ~ x1 + s(x2) + s(x3)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.9852     0.1094  -9.005   <2e-16 ***
#> x1            2.1885     0.1691  12.940   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>         edf Ref.df Chi.sq  p-value    
#> s(x2) 2.805  3.499 31.729 1.86e-06 ***
#> s(x3) 1.000  1.000  0.134    0.714    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> EQUATION 2
#> Link function for mu.2: probit 
#> Formula: y2 ~ x1 + s(x2) + s(x3)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.43374    0.09316   4.656 3.22e-06 ***
#> x1          -1.14477    0.13880  -8.248  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>         edf Ref.df Chi.sq  p-value    
#> s(x2) 4.729  5.784 33.806 6.82e-06 ***
#> s(x3) 1.000  1.000  0.189    0.664    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> n = 400  theta = 0.484(0.22,0.639)  tau = 0.322(0.141,0.441)
#> total edf = 14.5
AIC(out)
#> [1] 774.3515

We can have a look at the estimated smooth function plots. The red lines indicate the true curves.

x2    <- sort(x2)
f1.x2 <- f1(x2)[order(x2)] - mean(f1(x2))
f2.x2 <- f2(x2)[order(x2)] - mean(f2(x2))
f3.x3 <- rep(0, length(x3))

par(mfrow=c(2,2),mar=c(4.5,4.5,2,2))

plot(out, eq = 1, select = 1, seWithMean = TRUE, scale = 0)
lines(x2, f1.x2, col = "red")

plot(out, eq = 1, select = 2, seWithMean = TRUE, scale = 0)
lines(x3, f3.x3, col = "red")

plot(out, eq = 2, select = 1, seWithMean = TRUE, scale = 0)
lines(x2, f2.x2, col = "red")

plot(out, eq = 2, select = 2, seWithMean = TRUE, scale = 0)
lines(x3, f3.x3, col = "red")

Bivariate probit model with splines and varying dependence parameter

A bivariate probit model with splines and a varying dependence parameter can be fitted as follows.

eq.mu.1  <- y1 ~ x1 + s(x2)
eq.mu.2  <- y2 ~ x1 + s(x2)
eq.theta <- ~ x1 + s(x2)
fl       <- list(eq.mu.1, eq.mu.2, eq.theta)

outD     <- gjrm(fl, data = dataSim,
                 margins = c("probit", "probit"),
                 Model = "B")
conv.check(outD)
#> 
#> Largest absolute gradient value: 3.840129e-10
#> Observed information matrix is positive definite
#> Eigenvalue range: [0.1033404,1078.919]
#> 
#> Trust region iterations before smoothing parameter estimation: 4
#> Loops for smoothing parameter estimation: 11
#> Trust region iterations within smoothing loops: 25
summary(outD)
#> 
#> COPULA:   Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#> 
#> EQUATION 1
#> Link function for mu.1: probit 
#> Formula: y1 ~ x1 + s(x2)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.9785     0.1099  -8.901   <2e-16 ***
#> x1            2.1742     0.1689  12.870   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>        edf Ref.df Chi.sq  p-value    
#> s(x2) 2.77  3.457  30.99 2.44e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> EQUATION 2
#> Link function for mu.2: probit 
#> Formula: y2 ~ x1 + s(x2)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.43697    0.09327   4.685  2.8e-06 ***
#> x1          -1.14120    0.13839  -8.246  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>         edf Ref.df Chi.sq  p-value    
#> s(x2) 4.621   5.66  33.27 7.97e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> EQUATION 3
#> Link function for theta: atanh 
#> Formula: ~x1 + s(x2)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)  
#> (Intercept)  0.58008    0.22994   2.523   0.0116 *
#> x1           0.03757    0.42260   0.089   0.9292  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>         edf Ref.df Chi.sq p-value
#> s(x2) 3.684  4.579  5.099   0.385
#> 
#> 
#> n = 400  theta = 0.51(-0.1,0.85)  tau = 0.35(-0.0681,0.663)
#> total edf = 17.1
# outD$theta

And similarly, the estimated smooth functions are plotted as follows.

par(mfrow=c(2,2),mar=c(4.5,4.5,2,2))

plot(outD, eq = 1, seWithMean = TRUE)
plot(outD, eq = 2, seWithMean = TRUE)
plot(outD, eq = 3, seWithMean = TRUE)

For more examples of joint regression models, see the gjrm vignettes.



egeminiani/GJRM documentation built on Sept. 1, 2020, 6:41 p.m.