probit_linear_latent: Recursive Probit-Linear Model with Latent First Stage

View source: R/probit_linear_latent.R

probit_linear_latentR Documentation

Recursive Probit-Linear Model with Latent First Stage


Latent version of the Probit-Linear Model.

First stage (Probit, m_i^* is unobserved):


Second stage (Linear):

y_i = \boldsymbol{\beta}'\mathbf{x_i} + {\gamma}m_i^* + \sigma v_i

Endogeneity structure: u_i and v_i are bivariate normally distributed with a correlation of \rho.

w and x can be the same set of variables. The identification of this model is generally weak, especially if w are not good predictors of m. \gamma is assumed to be positive to ensure that the model estimates are unique.


  data = NULL,
  EM = TRUE,
  par = NULL,
  method = "BFGS",
  verbose = 0,
  maxIter = 500,
  tol = 1e-06,
  tol_LL = 1e-08



Formula for the first-stage probit model, in which the dependent variable is latent


Formula for the second stage linear model. The latent dependent variable of the first stage is automatically added as a regressor in this model


Input data, a data frame


Whether to maximize likelihood use the Expectation-Maximization (EM) algorithm, which is slower but more robust. Defaults to TRUE.


Starting values for estimates


Optimization algorithm. Default is BFGS


A integer indicating how much output to display during the estimation process.

  • <0 - No ouput

  • 0 - Basic output (model estimates)

  • 1 - Moderate output, basic ouput + parameter and likelihood in each iteration

  • 2 - Extensive output, moderate output + gradient values on each call


max iterations for EM algorithm


tolerance for convergence of EM algorithm


tolerance for convergence of likelihood


A list containing the results of the estimated model, some of which are inherited from the return of maxLik

  • estimates: Model estimates with 95% confidence intervals

  • estimate or par: Point estimates

  • variance_type: covariance matrix used to calculate standard errors. Either BHHH or Hessian.

  • var: covariance matrix

  • se: standard errors

  • gradient: Gradient function at maximum

  • hessian: Hessian matrix at maximum

  • gtHg: g'H^-1g, where H^-1 is simply the covariance matrix. A value close to zero (e.g., <1e-3 or 1e-6) indicates good convergence.

  • LL or maximum: Likelihood

  • AIC: AIC

  • BIC: BIC

  • n_obs: Number of observations

  • n_par: Number of parameters

  • iter: number of iterations taken to converge

  • message: Message regarding convergence status.

Note that the list inherits all the components in the output of maxLik. See the documentation of maxLik for more details.


Peng, Jing. (2023) Identification of Causal Mechanisms from Randomized Experiments: A Framework for Endogenous Mediation Analysis. Information Systems Research, 34(1):67-84. Available at

See Also

Other endogeneity: bilinear(), biprobit_latent(), biprobit_partial(), biprobit(), linear_probit(), pln_linear(), pln_probit(), probit_linearRE(), probit_linear_partial(), probit_linear()


N = 2000
rho = -0.5

x = rbinom(N, 1, 0.5)
z = rnorm(N)

e = mvrnorm(N, mu=c(0,0), Sigma=matrix(c(1,rho,rho,1), nrow=2))
e1 = e[,1]
e2 = e[,2]

m = as.numeric(1 + x + z + e1 > 0)
y = 1 + x + z + m + e2
est = probit_linear(m~x+z, y~x+z+m)
print(est$estimates, digits=3)

est_latent = probit_linear_latent(~x+z, y~x+z)
print(est_latent$estimates, digits=3)

endogeneity documentation built on Aug. 21, 2023, 9:11 a.m.