Authors: Giampiero Marra, Rosalba Radice License: GPL (>= 2)
The GJRM
package offers numerous routines for fitting various joint (and univariate) regression models, with several types of covariate effects, in the presence of equations' errors association, endogeneity, non-random sample selection or partial observability.
You can install the GJRM
package from CRAN:
install.packages("GJRM")
gjrm
functionLet us consider a simple example that shows how to use the gjrm
function. First, we generate the data from a joint model with binary margins.
library(GJRM)
set.seed(0)
n <- 400
Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1
u <- rMVN(n, rep(0,2), Sigma)
x1 <- round(runif(n)); x2 <- runif(n); x3 <- runif(n)
f1 <- function(x) cos(pi*2*x) + sin(pi*x)
f2 <- function(x) x+exp(-30*(x-0.5)^2)
y1 <- ifelse(-1.55 + 2*x1 + f1(x2) + u[,1] > 0, 1, 0)
y2 <- ifelse(-0.25 - 1.25*x1 + f2(x2) + u[,2] > 0, 1, 0)
dataSim <- data.frame(y1, y2, x1, x2, x3)
We can use the gjrm
function to estimate a classic bivariate probit model as follows.
out <- gjrm(list(y1 ~ x1 + x2 + x3,
y2 ~ x1 + x2 + x3),
data = dataSim,
margins = c("probit", "probit"),
Model = "B")
The convergence status can be examined through the conv.check
function.
conv.check(out)
#>
#> Largest absolute gradient value: 5.030004e-14
#> Observed information matrix is positive definite
#> Eigenvalue range: [9.528687,419.9106]
#>
#> Trust region iterations: 3
An overview of the parameter estimates is available through the summary
method.
summary(out)
#>
#> COPULA: Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#>
#> EQUATION 1
#> Link function for mu.1: probit
#> Formula: y1 ~ x1 + x2 + x3
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.94936 0.20532 -4.624 3.77e-06 ***
#> x1 1.97383 0.15150 13.028 < 2e-16 ***
#> x2 0.20574 0.25474 0.808 0.419
#> x3 -0.06393 0.25939 -0.246 0.805
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> EQUATION 2
#> Link function for mu.2: probit
#> Formula: y2 ~ x1 + x2 + x3
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.04474 0.17736 0.252 0.80084
#> x1 -1.07918 0.13393 -8.058 7.76e-16 ***
#> x2 0.66869 0.23016 2.905 0.00367 **
#> x3 0.08599 0.23228 0.370 0.71124
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> n = 400 theta = 0.282(0.0492,0.45) tau = 0.182(0.0313,0.297)
#> total edf = 9
Information criteria are readily calculated using the AIC
and BIC
functions.
AIC(out)
#> [1] 840.3036
BIC(out)
#> [1] 876.2268
A bivariate probit model with splines can be estimated in the following way.
out <- gjrm(list(y1 ~ x1 + s(x2) + s(x3),
y2 ~ x1 + s(x2) + s(x3)),
data = dataSim,
margins = c("probit", "probit"),
Model = "B")
conv.check(out)
#>
#> Largest absolute gradient value: 1.308371e-09
#> Observed information matrix is positive definite
#> Eigenvalue range: [0.2366899,5.216987e+13]
#>
#> Trust region iterations before smoothing parameter estimation: 4
#> Loops for smoothing parameter estimation: 3
#> Trust region iterations within smoothing loops: 4
summary(out)
#>
#> COPULA: Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#>
#> EQUATION 1
#> Link function for mu.1: probit
#> Formula: y1 ~ x1 + s(x2) + s(x3)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.9852 0.1094 -9.005 <2e-16 ***
#> x1 2.1885 0.1691 12.940 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Smooth components' approximate significance:
#> edf Ref.df Chi.sq p-value
#> s(x2) 2.805 3.499 31.729 1.86e-06 ***
#> s(x3) 1.000 1.000 0.134 0.714
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> EQUATION 2
#> Link function for mu.2: probit
#> Formula: y2 ~ x1 + s(x2) + s(x3)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.43374 0.09316 4.656 3.22e-06 ***
#> x1 -1.14477 0.13880 -8.248 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Smooth components' approximate significance:
#> edf Ref.df Chi.sq p-value
#> s(x2) 4.729 5.784 33.806 6.82e-06 ***
#> s(x3) 1.000 1.000 0.189 0.664
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> n = 400 theta = 0.484(0.22,0.639) tau = 0.322(0.141,0.441)
#> total edf = 14.5
AIC(out)
#> [1] 774.3515
We can have a look at the estimated smooth function plots. The red lines indicate the true curves.
x2 <- sort(x2)
f1.x2 <- f1(x2)[order(x2)] - mean(f1(x2))
f2.x2 <- f2(x2)[order(x2)] - mean(f2(x2))
f3.x3 <- rep(0, length(x3))
par(mfrow=c(2,2),mar=c(4.5,4.5,2,2))
plot(out, eq = 1, select = 1, seWithMean = TRUE, scale = 0)
lines(x2, f1.x2, col = "red")
plot(out, eq = 1, select = 2, seWithMean = TRUE, scale = 0)
lines(x3, f3.x3, col = "red")
plot(out, eq = 2, select = 1, seWithMean = TRUE, scale = 0)
lines(x2, f2.x2, col = "red")
plot(out, eq = 2, select = 2, seWithMean = TRUE, scale = 0)
lines(x3, f3.x3, col = "red")
A bivariate probit model with splines and a varying dependence parameter can be fitted as follows.
eq.mu.1 <- y1 ~ x1 + s(x2)
eq.mu.2 <- y2 ~ x1 + s(x2)
eq.theta <- ~ x1 + s(x2)
fl <- list(eq.mu.1, eq.mu.2, eq.theta)
outD <- gjrm(fl, data = dataSim,
margins = c("probit", "probit"),
Model = "B")
conv.check(outD)
#>
#> Largest absolute gradient value: 3.840129e-10
#> Observed information matrix is positive definite
#> Eigenvalue range: [0.1033404,1078.919]
#>
#> Trust region iterations before smoothing parameter estimation: 4
#> Loops for smoothing parameter estimation: 11
#> Trust region iterations within smoothing loops: 25
summary(outD)
#>
#> COPULA: Gaussian
#> MARGIN 1: Bernoulli
#> MARGIN 2: Bernoulli
#>
#> EQUATION 1
#> Link function for mu.1: probit
#> Formula: y1 ~ x1 + s(x2)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.9785 0.1099 -8.901 <2e-16 ***
#> x1 2.1742 0.1689 12.870 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Smooth components' approximate significance:
#> edf Ref.df Chi.sq p-value
#> s(x2) 2.77 3.457 30.99 2.44e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> EQUATION 2
#> Link function for mu.2: probit
#> Formula: y2 ~ x1 + s(x2)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.43697 0.09327 4.685 2.8e-06 ***
#> x1 -1.14120 0.13839 -8.246 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Smooth components' approximate significance:
#> edf Ref.df Chi.sq p-value
#> s(x2) 4.621 5.66 33.27 7.97e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> EQUATION 3
#> Link function for theta: atanh
#> Formula: ~x1 + s(x2)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.58008 0.22994 2.523 0.0116 *
#> x1 0.03757 0.42260 0.089 0.9292
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Smooth components' approximate significance:
#> edf Ref.df Chi.sq p-value
#> s(x2) 3.684 4.579 5.099 0.385
#>
#>
#> n = 400 theta = 0.51(-0.1,0.85) tau = 0.35(-0.0681,0.663)
#> total edf = 17.1
# outD$theta
And similarly, the estimated smooth functions are plotted as follows.
par(mfrow=c(2,2),mar=c(4.5,4.5,2,2))
plot(outD, eq = 1, seWithMean = TRUE)
plot(outD, eq = 2, seWithMean = TRUE)
plot(outD, eq = 3, seWithMean = TRUE)
For more examples of joint regression models, see the gjrm vignettes.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.