Fit heteroskedastic binary response models via maximum likelihood.
1 2 3 4 5 6 7 8  hetglm(formula, data, subset, na.action, weights, offset,
family = binomial(link = "probit"),
link.scale = c("log", "sqrt", "identity"),
control = hetglm.control(...),
model = TRUE, y = TRUE, x = FALSE, ...)
hetglm.fit(x, y, z = NULL, weights = NULL, offset = NULL,
family = binomial(), link.scale = "log", control = hetglm.control())

formula 
symbolic description of the model (of type 
data, subset, na.action 
arguments controlling formula processing
via 
weights 
optional numeric vector of case weights. 
offset 
optional numeric vector(s) with an a priori known component to be included in the linear predictor(s). 
family 
family object (including the link function of the mean model). 
link.scale 
character specification of the link function in the latent scale model. 
control 
a list of control arguments specified via

model, y, x 
logicals. If 
z 
numeric matrix. Regressor matrix for the precision model, defaulting to an intercept only. 
... 
arguments passed to 
A set of standard extractor functions for fitted model objects is available for
objects of class "hetglm"
, including methods to the generic functions
print
, summary
, coef
,
vcov
, logLik
, residuals
,
predict
, terms
, update
,
model.frame
, model.matrix
,
estfun
and bread
(from the sandwich package), and
coeftest
(from the lmtest package).
hetglm
returns an object of class "hetglm"
, i.e., a list with components as follows.
hetglm.fit
returns an unclassed list with components up to converged
.
coefficients 
a list with elements 
residuals 
a vector of raw residuals (observed  fitted), 
fitted.values 
a vector of fitted means, 
optim 
output from the 
method 
the method argument passed to the 
control 
the control arguments passed to the 
start 
the starting values for the parameters passed to the 
weights 
the weights used (if any), 
offset 
the list of offset vectors used (if any), 
n 
number of observations, 
nobs 
number of observations with nonzero weights, 
df.null 
residual degrees of freedom in the homoskedastic null model, 
df.residual 
residual degrees of freedom in the fitted model, 
loglik 
loglikelihood of the fitted model, 
loglik.null 
loglikelihood of the homoskedastic null model, 
dispersion 
estimate of the dispersion parameter (if any), 
vcov 
covariance matrix of all parameters in the model, 
family 
the family object used, 
link 
a list with elements 
converged 
logical indicating successful convergence of 
call 
the original function call, 
formula 
the original formula, 
terms 
a list with elements 
levels 
a list with elements 
contrasts 
a list with elements 
model 
the full model frame (if 
y 
the response vector (if 
x 
a list with elements 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129  ## Generate artifical binary data from a latent
## heteroskedastic normally distributed variable
set.seed(48)
n < 200
x < rnorm(n)
ystar < 1 + x + rnorm(n, sd = exp(x))
y < factor(ystar > 0)
## visualization
par(mfrow = c(1, 2))
plot(ystar ~ x, main = "latent")
abline(h = 0, lty = 2)
plot(y ~ x, main = "observed")
## model fitting of homoskedastic model (m0a/m0b)
## and heteroskedastic model (m)
m0a < glm(y ~ x, family = binomial(link = "probit"))
m0b < hetglm(y ~ x  1)
m < hetglm(y ~ x)
## coefficient estimates
cbind(heteroskedastic = coef(m),
homoskedastic = c(coef(m0a), 0))
## summary of correct heteroskedastic model
summary(m)
## Generate artificial binary data with a single binary regressor
## driving the heteroskedasticity in a model with two regressors
set.seed(48)
n < 200
x < rnorm(n)
z < rnorm(n)
a < factor(sample(1:2, n, replace = TRUE))
ystar < 1 + c(0, 1)[a] + x + z + rnorm(n, sd = c(1, 2)[a])
y < factor(ystar > 0)
## fit "true" heteroskedastic model
m1 < hetglm(y ~ a + x + z  a)
## fit interaction model
m2 < hetglm(y ~ a/(x + z)  1)
## although not obvious at first sight, the two models are
## nested. m1 is a restricted version of m2 where the following
## holds: a1:x/a2:x == a1:z/a2:z
if(require("lmtest")) lrtest(m1, m2)
## both ratios are == 2 in the data generating process
c(x = coef(m2)[3]/coef(m2)[4], z = coef(m2)[5]/coef(m2)[6])
if(require("AER")) {
## Labor force participation example from Greene
## (5th edition: Table 21.3, p. 682)
## (6th edition: Table 23.4, p. 790)
## data (including transformations)
data("PSID1976", package = "AER")
PSID1976$kids < with(PSID1976, factor((youngkids + oldkids) > 0,
levels = c(FALSE, TRUE), labels = c("no", "yes")))
PSID1976$fincome < PSID1976$fincome/10000
## Standard probit model via glm()
lfp0a < glm(participation ~ age + I(age^2) + fincome + education + kids,
data = PSID1976, family = binomial(link = "probit"))
## Standard probit model via hetglm() with constant scale
lfp0b < hetglm(participation ~ age + I(age^2) + fincome + education + kids  1,
data = PSID1976)
## Probit model with varying scale
lfp1 < hetglm(participation ~ age + I(age^2) + fincome + education + kids  kids + fincome,
data = PSID1976)
## Likelihood ratio and Wald test
lrtest(lfp0b, lfp1)
waldtest(lfp0b, lfp1)
## confusion matrices
table(true = PSID1976$participation,
predicted = fitted(lfp0b) <= 0.5)
table(true = PSID1976$participation,
predicted = fitted(lfp1) <= 0.5)
## Adapted (and somewhat artificial) example to illustrate that
## certain models with heteroskedastic scale can equivalently
## be interpreted as homoskedastic scale models with interaction
## effects.
## probit model with main effects and heteroskedastic scale in two groups
m < hetglm(participation ~ kids + fincome  kids, data = PSID1976)
## probit model with interaction effects and homoskedastic scale
p < glm(participation ~ kids * fincome, data = PSID1976,
family = binomial(link = "probit"))
## both likelihoods are equivalent
logLik(m)
logLik(p)
## intercept/slope for the kids=="no" group
coef(m)[c(1, 3)]
coef(p)[c(1, 3)]
## intercept/slope for the kids=="yes" group
c(sum(coef(m)[1:2]), coef(m)[3]) / exp(coef(m)[4])
coef(p)[c(1, 3)] + coef(p)[c(2, 4)]
## Wald tests for the heteroskedasticity effect in m and the
## interaction effect in p are very similar
coeftest(m)[4,]
coeftest(p)[4,]
## corresponding likelihood ratio tests are equivalent
## (due to the invariance of the MLE)
m0 < hetglm(participation ~ kids + fincome  1, data = PSID1976)
p0 < glm(participation ~ kids + fincome, data = PSID1976,
family = binomial(link = "probit"))
lrtest(m0, m)
lrtest(p0, p)
}

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.