glmx: Generalized Linear Models with Extra Parameters
In glmx: Generalized Linear Models Extended

glmx	R Documentation

Generalized Linear Models with Extra Parameters

Description

Estimation of generalized linear models with extra parameters, e.g., parametric links, or families with additional parameters (such as negative binomial).

Usage

glmx(formula, data, subset, na.action, weights, offset,
  family = negative.binomial, xlink = "log", control = glmx.control(...),
  model = TRUE, y = TRUE, x = FALSE, ...)

glmx.fit(x, y, weights = NULL, offset = NULL,
  family = negative.binomial, xlink = "log", control = glmx.control())

Arguments

`formula`	symbolic description of the model.
`data`, `subset`, `na.action`	arguments controlling formula processing via `model.frame`.
`weights`	optional numeric vector of case weights.
`offset`	optional numeric vector(s) with an a priori known component to be included in the linear predictor.
`family`	function that returns a `"family"` object, i.e., `family(x)` needs to be a `"family"` object when `x` is the numeric vector of extra parameters (by default assumed to be 1-dimensional).
`xlink`	link object or a character that can be passed to `make.link`. It should link the extra parameters to real parameters.
`control`	a list of control arguments as returned by `glmx.control`.
`model`, `y`, `x`	logicals. If `TRUE` the corresponding components of the fit (model frame, response, model matrix) are returned. For `glmx.fit`, `x` should be a numeric regressor matrix and `y` should be the response vector.
`...`	control arguments.

Details

The function glmx is a convenience interface that estimates generalized linear models (GLMs) with extra parameters. Examples would be binary response models with parametric link functions or count regression using a negative binomial family (which has one additional parameter).

Hence, glmx needs a family argument which is a family-generating function depending on one numeric argument for the extra parameters. Then, either profile-likelihood methods can be used for optimizing the extra parameters or all parameters can be optimized jointly.

If the generated family contains a list element loglik.extra for the derivative of the log-likelihood with respect to the extra parameters (i.e., score/gradient contributions), then this is used in the optimization process. This should be a function(y, mu, extra) depending on the observed response y, the estimated mean mu, and the extra parameters.

Value

glmx returns an object of class "glmx", i.e., a list with components as follows. glmx.fit returns an unclassed list with components up to converged.

`coefficients`	a list with elements `"glm"` and `"extra"` containing the coefficients from the respective models,
`residuals`	a vector of deviance residuals,
`fitted.values`	a vector of fitted means,
`optim`	list of `optim` outputs for maximizing the `"profile"` and `"full"` log-likelihood, respectively,
`weights`	the weights used (if any),
`offset`	the list of offset vectors used (if any),
`n`	number of observations,
`nobs`	number of observations with non-zero weights,
`df`	number of estimated parameters,
`loglik`	log-likelihood of the fitted model,
`dispersion`	estimate of the dispersion parameter (if any),
`vcov`	covariance matrix of all parameters in the model,
`family`	a list with elements `"glm"` and `"extra"` where the former contains the `"family"` object at the optimal extra parameters and the latter the family-generating function,
`xlink`	the link object for the extra parameters,
`control`	control options used,
`converged`	logical indicating successful convergence of `optim`,
`call`	the original function call,
`formula`	the formula,
`terms`	the terms object for the model,
`levels`	the levels of the categorical regressors,
`contrasts`	the contrasts corresponding to `levels`,
`model`	the full model frame (if `model = TRUE`),
`y`	the response vector (if `y = TRUE`),
`x`	the model matrix (if `x = TRUE`).

Examples

## artificial data from geometric regression
set.seed(1)
d <- data.frame(x = runif(200, -1, 1))
d$y <- rnbinom(200, mu = exp(0 + 3 * d$x), size = 1)

### negative binomial regression ###

## negative binomial regression via glmx
if(require("MASS")) {
m_nb1 <- glmx(y ~ x, data = d,
  family = negative.binomial, xlink = "log", xstart = 0)
summary(m_nb1)

## negative binomial regression via MASS::glm.nb
m_nb2 <- glm.nb(y ~ x, data = d)
summary(m_nb2)

## comparison
if(require("lmtest")) {
logLik(m_nb1)
logLik(m_nb2)
coeftest(m_nb1)
coeftest(m_nb2)
exp(coef(m_nb1, model = "extra"))
m_nb2$theta
exp(coef(m_nb1, model = "extra")) * sqrt(vcov(m_nb1, model = "extra"))
m_nb2$SE.theta
}}

## if the score (or gradient) contribution of the extra parameters
## is supplied, then estimation can be speeded up:
negbin <- function(theta) {
  fam <- negative.binomial(theta)
  fam$loglik.extra <- function(y, mu, theta) digamma(y + theta) - digamma(theta) +
    log(theta) + 1 - log(mu + theta) - (y + theta)/(mu + theta)
  fam
}
m_nb3 <- glmx(y ~ x, data = d,
  family = negbin, xlink = "log", xstart = 0, profile = FALSE)
all.equal(coef(m_nb1), coef(m_nb3), tolerance = 1e-7)


### censored negative binomial hurdle regression (0 vs. > 0) ###

## negative binomial zero hurdle part via glmx
nbbin <- function(theta) binomial(link = nblogit(theta))
m_hnb1 <- glmx(factor(y > 0) ~ x, data = d,
  family = nbbin, xlink = "log", xstart = 0)
summary(m_hnb1)

## negative binomial hurdle regression via pscl::hurdle
## (see only zero hurdle part)
if(require("pscl")) {
m_hnb2 <- hurdle(y ~ x, data = d, dist = "negbin", zero.dist = "negbin")
summary(m_hnb2)
}

glmx documentation built on Sept. 4, 2024, 3 a.m.