README.md
In egeminiani/GJRM: Generalised Joint Regression Modelling

GJRM: Generalised Joint Regression Modelling

Authors: Giampiero Marra, Rosalba Radice License: GPL (>= 2)

The GJRM package offers numerous routines for fitting various joint (and univariate) regression models, with several types of covariate effects, in the presence of equations' errors association, endogeneity, non-random sample selection or partial observability.

You can install the GJRM package from CRAN:

install.packages("GJRM")

Joint models with binary margins: the `gjrm` function

Let us consider a simple example that shows how to use the gjrm function. First, we generate the data from a joint model with binary margins.

library(GJRM)

set.seed(0)
n     <- 400
Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1
u     <- rMVN(n, rep(0,2), Sigma)
x1    <- round(runif(n)); x2 <- runif(n); x3 <- runif(n)
f1    <- function(x) cos(pi*2*x) + sin(pi*x)
f2    <- function(x) x+exp(-30*(x-0.5)^2)
y1    <- ifelse(-1.55 + 2*x1 + f1(x2) + u[,1] > 0, 1, 0)
y2    <- ifelse(-0.25 - 1.25*x1 + f2(x2) + u[,2] > 0, 1, 0)
dataSim <- data.frame(y1, y2, x1, x2, x3)

Classic bivariate probit model

We can use the gjrm function to estimate a classic bivariate probit model as follows.

out <- gjrm(list(y1 ~ x1 + x2 + x3, 
                 y2 ~ x1 + x2 + x3),
            data = dataSim,
            margins = c("probit", "probit"),
            Model = "B")

The convergence status can be examined through the conv.check function.

conv.check(out)
#> 
#> Largest absolute gradient value: 5.030004e-14
#> Observed information matrix is positive definite
#> Eigenvalue range: [9.528687,419.9106]
#> 
#> Trust region iterations: 3

An overview of the parameter estimates is available through the summary method.

summary(out)
#> 
#> COPULA:   Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#> 
#> EQUATION 1
#> Link function for mu.1: probit 
#> Formula: y1 ~ x1 + x2 + x3
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -0.94936    0.20532  -4.624 3.77e-06 ***
#> x1           1.97383    0.15150  13.028  < 2e-16 ***
#> x2           0.20574    0.25474   0.808    0.419    
#> x3          -0.06393    0.25939  -0.246    0.805    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> EQUATION 2
#> Link function for mu.2: probit 
#> Formula: y2 ~ x1 + x2 + x3
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.04474    0.17736   0.252  0.80084    
#> x1          -1.07918    0.13393  -8.058 7.76e-16 ***
#> x2           0.66869    0.23016   2.905  0.00367 ** 
#> x3           0.08599    0.23228   0.370  0.71124    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> n = 400  theta = 0.282(0.0492,0.45)  tau = 0.182(0.0313,0.297)
#> total edf = 9

Information criteria are readily calculated using the AIC and BIC functions.

AIC(out)
#> [1] 840.3036
BIC(out)
#> [1] 876.2268

Bivariate probit model with splines

A bivariate probit model with splines can be estimated in the following way.

out <- gjrm(list(y1 ~ x1 + s(x2) + s(x3),
                 y2 ~ x1 + s(x2) + s(x3)),
            data = dataSim,
            margins = c("probit", "probit"),
            Model = "B")
conv.check(out)
#> 
#> Largest absolute gradient value: 1.308371e-09
#> Observed information matrix is positive definite
#> Eigenvalue range: [0.2366899,5.216987e+13]
#> 
#> Trust region iterations before smoothing parameter estimation: 4
#> Loops for smoothing parameter estimation: 3
#> Trust region iterations within smoothing loops: 4
summary(out)
#> 
#> COPULA:   Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#> 
#> EQUATION 1
#> Link function for mu.1: probit 
#> Formula: y1 ~ x1 + s(x2) + s(x3)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.9852     0.1094  -9.005   <2e-16 ***
#> x1            2.1885     0.1691  12.940   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>         edf Ref.df Chi.sq  p-value    
#> s(x2) 2.805  3.499 31.729 1.86e-06 ***
#> s(x3) 1.000  1.000  0.134    0.714    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> EQUATION 2
#> Link function for mu.2: probit 
#> Formula: y2 ~ x1 + s(x2) + s(x3)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.43374    0.09316   4.656 3.22e-06 ***
#> x1          -1.14477    0.13880  -8.248  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>         edf Ref.df Chi.sq  p-value    
#> s(x2) 4.729  5.784 33.806 6.82e-06 ***
#> s(x3) 1.000  1.000  0.189    0.664    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> n = 400  theta = 0.484(0.22,0.639)  tau = 0.322(0.141,0.441)
#> total edf = 14.5
AIC(out)
#> [1] 774.3515

We can have a look at the estimated smooth function plots. The red lines indicate the true curves.

x2    <- sort(x2)
f1.x2 <- f1(x2)[order(x2)] - mean(f1(x2))
f2.x2 <- f2(x2)[order(x2)] - mean(f2(x2))
f3.x3 <- rep(0, length(x3))

par(mfrow=c(2,2),mar=c(4.5,4.5,2,2))

plot(out, eq = 1, select = 1, seWithMean = TRUE, scale = 0)
lines(x2, f1.x2, col = "red")

plot(out, eq = 1, select = 2, seWithMean = TRUE, scale = 0)
lines(x3, f3.x3, col = "red")

plot(out, eq = 2, select = 1, seWithMean = TRUE, scale = 0)
lines(x2, f2.x2, col = "red")

plot(out, eq = 2, select = 2, seWithMean = TRUE, scale = 0)
lines(x3, f3.x3, col = "red")

Bivariate probit model with splines and varying dependence parameter

A bivariate probit model with splines and a varying dependence parameter can be fitted as follows.

eq.mu.1  <- y1 ~ x1 + s(x2)
eq.mu.2  <- y2 ~ x1 + s(x2)
eq.theta <- ~ x1 + s(x2)
fl       <- list(eq.mu.1, eq.mu.2, eq.theta)

outD     <- gjrm(fl, data = dataSim,
                 margins = c("probit", "probit"),
                 Model = "B")
conv.check(outD)
#> 
#> Largest absolute gradient value: 3.840129e-10
#> Observed information matrix is positive definite
#> Eigenvalue range: [0.1033404,1078.919]
#> 
#> Trust region iterations before smoothing parameter estimation: 4
#> Loops for smoothing parameter estimation: 11
#> Trust region iterations within smoothing loops: 25
summary(outD)
#> 
#> COPULA:   Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#> 
#> EQUATION 1
#> Link function for mu.1: probit 
#> Formula: y1 ~ x1 + s(x2)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.9785     0.1099  -8.901   <2e-16 ***
#> x1            2.1742     0.1689  12.870   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>        edf Ref.df Chi.sq  p-value    
#> s(x2) 2.77  3.457  30.99 2.44e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> EQUATION 2
#> Link function for mu.2: probit 
#> Formula: y2 ~ x1 + s(x2)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.43697    0.09327   4.685  2.8e-06 ***
#> x1          -1.14120    0.13839  -8.246  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>         edf Ref.df Chi.sq  p-value    
#> s(x2) 4.621   5.66  33.27 7.97e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> EQUATION 3
#> Link function for theta: atanh 
#> Formula: ~x1 + s(x2)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)  
#> (Intercept)  0.58008    0.22994   2.523   0.0116 *
#> x1           0.03757    0.42260   0.089   0.9292  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth components' approximate significance:
#>         edf Ref.df Chi.sq p-value
#> s(x2) 3.684  4.579  5.099   0.385
#> 
#> 
#> n = 400  theta = 0.51(-0.1,0.85)  tau = 0.35(-0.0681,0.663)
#> total edf = 17.1
# outD$theta

And similarly, the estimated smooth functions are plotted as follows.

par(mfrow=c(2,2),mar=c(4.5,4.5,2,2))

plot(outD, eq = 1, seWithMean = TRUE)
plot(outD, eq = 2, seWithMean = TRUE)
plot(outD, eq = 3, seWithMean = TRUE)