Generalized Linear Models with Extra Parameters

Description

Estimation of generalized linear models with extra parameters, e.g., parametric links, or families with additional parameters (such as negative binomial).

Usage

1
2
3
4
5
6
glmx(formula, data, subset, na.action, weights, offset,
  family = negative.binomial, xlink = "log", control = glmx.control(...),
  model = TRUE, y = TRUE, x = FALSE, ...)

glmx.fit(x, y, weights = NULL, offset = NULL,
  family = negative.binomial, xlink = "log", control = glmx.control())

Arguments

formula

symbolic description of the model.

data, subset, na.action

arguments controlling formula processing via model.frame.

weights

optional numeric vector of case weights.

offset

optional numeric vector(s) with an a priori known component to be included in the linear predictor.

family

function that returns a "family" object, i.e., family(x) needs to be a "family" object when x is the numeric vector of extra parameters (by default assumed to be 1-dimensional).

xlink

link object or a character that can be passed to make.link. It should link the extra parameters to real parameters.

control

a list of control arguments as returned by glmx.control.

model, y, x

logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned. For glmx.fit, x should be a numeric regressor matrix and y should be the response vector.

...

control arguments.

Details

The function glmx is a convenience interface that estimates generalized linear models (GLMs) with extra parameters. Examples would be binary response models with parametric link functions or count regression using a negative binomial family (which has one additional parameter).

Hence, glmx needs a family argument which is a family-generating function depending on one numeric argument for the extra parameters. Then, either profile-likelihood methods can be used for optimizing the extra parameters or all parameters can be optimized jointly.

If the generated family contains a list element loglik.extra for the derivative of the log-likelihood with respect to the extra parameters (i.e., score/gradient contributions), then this is used in the optimization process. This should be a function(y, mu, extra) depending on the observed response y, the estimated mean mu, and the extra parameters.

Value

glmx returns an object of class "glmx", i.e., a list with components as follows. glmx.fit returns an unclassed list with components up to converged.

coefficients

a list with elements "glm" and "extra" containing the coefficients from the respective models,

residuals

a vector of deviance residuals,

fitted.values

a vector of fitted means,

optim

list of optim outputs for maximizing the "profile" and "full" log-likelihood, respectively,

weights

the weights used (if any),

offset

the list of offset vectors used (if any),

n

number of observations,

nobs

number of observations with non-zero weights,

df

number of estimated parameters,

loglik

log-likelihood of the fitted model,

dispersion

estimate of the dispersion parameter (if any),

vcov

covariance matrix of all parameters in the model,

family

a list with elements "glm" and "extra" where the former contains the "family" object at the optimal extra parameters and the latter the family-generating function,

xlink

the link object for the extra parameters,

control

control options used,

converged

logical indicating successful convergence of optim,

call

the original function call,

formula

the formula,

terms

the terms object for the model,

levels

the levels of the categorical regressors,

contrasts

the contrasts corresponding to levels,

model

the full model frame (if model = TRUE),

y

the response vector (if y = TRUE),

x

the model matrix (if x = TRUE).

See Also

glmx.control, hetglm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
## artificial data from geometric regression
set.seed(1)
d <- data.frame(x = runif(200, -1, 1))
d$y <- rnbinom(200, mu = exp(0 + 3 * d$x), size = 1)

### negative binomial regression ###

## negative binomial regression via glmx
if(require("MASS")) {
m_nb1 <- glmx(y ~ x, data = d,
  family = negative.binomial, xlink = "log", xstart = 0)
summary(m_nb1)

## negative binomial regression via MASS::glm.nb
m_nb2 <- glm.nb(y ~ x, data = d)
summary(m_nb2)

## comparison
if(require("lmtest")) {
logLik(m_nb1)
logLik(m_nb2)
coeftest(m_nb1)
coeftest(m_nb2)
exp(coef(m_nb1, model = "extra"))
m_nb2$theta
exp(coef(m_nb1, model = "extra")) * sqrt(vcov(m_nb1, model = "extra"))
m_nb2$SE.theta
}}

## if the score (or gradient) contribution of the extra parameters
## is supplied, then estimation can be speeded up:
negbin <- function(theta) {
  fam <- negative.binomial(theta)
  fam$loglik.extra <- function(y, mu, theta) digamma(y + theta) - digamma(theta) +
    log(theta) + 1 - log(mu + theta) - (y + theta)/(mu + theta)
  fam
}
m_nb3 <- glmx(y ~ x, data = d,
  family = negbin, xlink = "log", xstart = 0, profile = FALSE)
all.equal(coef(m_nb1), coef(m_nb3))


### censored negative binomial hurdle regression (0 vs. > 0) ###

## negative binomial zero hurdle part via glmx
nbbin <- function(theta) binomial(link = nblogit(theta))
m_hnb1 <- glmx(factor(y > 0) ~ x, data = d,
  family = nbbin, xlink = "log", xstart = 0)
summary(m_hnb1)

## negative binomial hurdle regression via pscl::hurdle
## (see only zero hurdle part)
if(require("pscl")) {
m_hnb2 <- hurdle(y ~ x, data = d, dist = "negbin", zero.dist = "negbin")
summary(m_hnb2)
}