hetprobit: Heteroscedastic Probit Regression Models
In hetprobit: Heteroscedastic Probit Regression Models

Description Usage Arguments Details Value References See Also Examples

Fitting heteroscedastic probit models via maximum likelihood.

hetprobit(formula, data, subset, na.action, 
  model = TRUE, y = TRUE, x = FALSE, 
  control = hetprobit_control(...), ...)

hetprobit_fit(x, y, z = NULL, control)

hetprobit_control(maxit = 5000, start = NULL, 
  grad = TRUE, hessian = TRUE, ...)

`formula`	a formula expression of the form `y ~ x \| z` where `y` is the response and `x` and `z` are regressor variables for the mean and scale model.
`data`	a data frame containing the variables occurring in the formulas. If not found in the data, variables are taken from `environment(formula)`.
`subset`	an optional vector specifying a subset of observations to be used for fitting.
`na.action`	a function which indicates what should happen when the data contain `NA`s.
`model, y, x`	logicals. If `TRUE` the corresponding model frame, response vector and model matrix are returned as components of the return value. For `hetprobit_fit`: `x` is a design matrix with regressors for the mean and `y` should be a binary response vector with zeros and ones.
`z`	design matrix with regressors for the scale model.
`...`	arguments to be used to form the default `control` argument if it is not supplied directly.
`control, maxit, start`	a list of control parameters passed to `optim`: `maxit` specifies the maximal number of iterations, default is set to 5000 iterations. `start` an optional vector with starting values.
`grad`	logical. Should gradients be used for optimization? If `TRUE`, the default `method` is `"BFGS"`. Otherwise `method = "Nelder-Mead"` is used.
`hessian`	logical or character. Should a numeric approximation of the (negative) Hessian matrix be computed? Either `FALSE` (or equivalently `"none"`) or `TRUE`. Alternatively, in the latter case, the default option is `hessian = "numDeriv"` so that `hessian` is used for approximating the Hessian. Another option is `hessian = "optim"` to signal that the Hessian should be computed by `optim`.

In standard probit regression models the probability of a success, i.e. P(Y = 1), is modelled as

P(Y = 1) = pi = F(x'b)

where F() is the cdf of a standard normal distribution.
The general assumption that the variance of the error term is constant and due to identifiability set to one is relaxed in the heteroscedastic probit model. The variance can vary systematically and is now modelled as a multiplicative function of regressor variables, i.e. sigma = exp(z'g). The probability of success is now represented by

pi = F(x'b / exp(z'g)).

Note that the scale model is only identified without intercept.

A set of standard extractor functions for fitted model objects is available for objects of class "hetprobit", including methods to the generic functions print, summary, coef, vcov, logLik, residuals, predict, terms, model.frame, model.matrix, update, estfun and bread (from the sandwich package), and getSummary (from the memisc package, enabling mtable).

See predict.hetprobit and coef.hetprobit for more details on some methods with non-standard arguments.

This is a somewhat simpler reimplementation of hetglm from the glmx package (Zeileis, Koenker, Doebler 2015). Compared to hetglm, hetprobit does not offer: analytical Hessian and flexible link functions for the mean and scale submodel, among further features.

An object of class "hetprobit", i.e. a list with the following components:

`coefficients`	a list with elements `mean` and `scale` that contain the estimated coefficients from the corresponding model,
`loglik`	the value of the log-likelihood of the fitted model,
`counts`	count of function and gradient evaluations from `optim`,
`convergence`	an integer. `0` indicates successful completion. For details see `optim`.
`message`	optional further information from `optim`,
`vcov`	covariance matrix of all parameters in the model,
`fitted.values`	a list with elements `mean` and `scale` containing the latent fitted means and standard deviations,
`residuals`	a vector of raw residuals (observed - fitted),
`method`	the method argument passed to the `optim` call,
`nobs`	number of observations,
`df`	degrees of freedom,
`call`	the original function call,
`formula`	the original model formula,
`terms`	a list with elements `mean`, `scale` and `full` containing the terms objects for the respective models,
`levels`	a list with elements `mean`, `scale` and `full`, that contain the levels of the categorical regressors,
`contrasts`	a list with elements `mean` and `scale` that contain the contrasts corresponding to `levels` from the respective model,
`model`	the full model frame if `model = TRUE`,
`y`	the numeric response vector if `y = TRUE`,
`x`	default is set to `x = FALSE`. If `x = TRUE` a list with elements `mean` and `scale` that contain the model matrices from the respective models is returned.

Alvarez R.M. and Brehm J. (1995) American Ambivalence Towards Abortion Policy: Development of a Heteroskedastic Probit Model of Competing Values. American Journal of Political Science, 39(4), 1055–1082.

Greene W.H. (2012) “Econometric Analysis”, Pearson, Prentice Hall, Seventh Edition.

Harvey A.C. (1976) Estimating Regression Models with Multiplicative Heteroscedasticity. Econometrica, 44(3), 461–465.

Keele L.J. and Park D.K. (2006) Ambivalent about Ambivalence: A Re-examination of Heteroskedastic Probit Models. Unpublished manuscript, Penn State University.

Zeileis A., Koenker R. and Doebler P. (2015) glmx: Generalized Linear Models Extended. R package version 0.1-1. https://cran.r-project.org/package=glmx

Formula, glmx, predict.hetprobit, coef.hetprobit

## packages
require("glmx")

## artificial example
## data-generating process
dgp <- function(n = 500, coef = c(0.5, -1.5, 0, 1, 0)) {
  d <- data.frame(
    x1 = runif(n, -1, 1),
    x2 = runif(n, -1, 1)
  )
  d$ystar <- rnorm(n,
    mean = coef[1] + coef[2] * d$x1 + coef[3] * d$x2,
    sd = exp(coef[4] * d$x1 + coef[5] * d$x2)
  )
  d$y <- ifelse(d$ystar > 0, 1, 0)
  return(d)
}

## data
set.seed(2017-05-20)
d <- dgp()

## model fitting:
## m0 with hetglm.fit from glmx package 
## m1 with function hetprobit()
m0 <- hetglm(y ~ x1 + x2, data = d)
m1 <- hetprobit(y ~ x1 + x2, data = d)

## comparison of coefficients
cbind("hetglm" = coef(m0), "hetprobit" = coef(m1))

## comparison of log-likelihoods
cbind(logLik(m0), logLik(m1))

## Not run: 
## data on voter turnout in U.S. presidential elections 1984
data("VoterTurnout", package = "hetprobit")

## heteroscedastic probit model 
## with years of education in the scale model 
m <- hetprobit(vote ~ education + age + south + govelection + closing | 
                education, data = VoterTurnout) 

summary(m)

## End(Not run)