nbreg: Negative Binomial Count Data Regression

View source: R/nbreg.R

nbregR Documentation

Negative Binomial Count Data Regression

Description

Fit negative binomial regression models for count data via maximum likelihood

Usage

nbreg(formula, data, subset, na.action, weights, offset, theta = NULL,
  dist = "NB2", link = "log", link.theta = "log", control = nbreg.control(...),
  model = TRUE, y = TRUE, x = FALSE, z = FALSE, hessA = TRUE, ...)

Arguments

formula

symbolic description of the model, see details.

data, subset, na.action

arguments controlling formula processing via model.frame.

weights

optional numeric vector of weights.

offset

optional numeric vector with an a priori known component to be included in the linear predictor. See below for more information on offsets.

theta

numeric. Optional. If specified, then the dispersion parameter is not estimated.

dist

character specification of the NB type. Either "NB2" or "NB1"

link

character specification of the link function for the mean. Currently, only "log" is supported.

link.theta

character specification of the link function for the dispersion parameter. Currently, only "log" is supported.

control

a list of control arguments specified via nbreg.control.

model, y, x, z

logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned.

hessA

logical. If TRUE, then the analytical Hessian is used to compute the covariance matrix of the estimator.

...

currently not used.

Details

The Negative Binomial Distribution is often used to model count data with overdispersion. Cameron and Trivedi (2013) offer two parametrization, negative binomial type 2 (NB2) and type 1 (NB1). NB2 is parametrized as follows

f(y | \mu, \theta) = \dfrac{\Gamma(\theta + y)}{\Gamma(\theta) y!} \left(\dfrac{\theta}{\theta + \mu}\right)^\theta \left(\dfrac{\mu}{\theta + \mu}\right)^y.

For NB1 replace \theta with \mu\theta on the RHS.

This function further allows us to model the dispersion parameter with covariates via a two-part formula. If a formula of type y ~ x1 + x2 is supplied, then the regressors are employed in the mean and the dispersion parameter is estimated as a constant, i.e. a standard NB2 or NB1 is estimated. This is equivalent to y ~ x1 + x2 | 1. If a formula of type y ~ x1 + x2 + x3 | z1 + z2 is given, then the mean mu is modeled using x1 + x2 and the dispersion parameter \theta with z1 + z2. If dist = "NB2", then the function estimates the NBH model.

Offsets can be specified in both the mean and dispersion parameter \theta: y ~ x1 + x2 + offset(x3) | z1 + offset(z2), where x3 is used as an offset (i.e., with coefficient fixed to 1) in the mean mu and z2 analogously in \theta. By the rule stated above y ~ x1 + offset(x2) is equivalent to y ~ x1 + offset(x2) | 1. Instead of using the offset() wrapper within the formula, the offset argument can also be employed which sets an offset only for mu. Thus, formula = y ~ x1 and offset = x2 is equivalent to formula = y ~ x1 + offset(x2) | 1.

All parameters are estimated by maximum likelihood using optim, with control options set in nbreg.control. Starting values can be supplied or are estimated by a Poisson regression in glm.fit (the default, starting values of coefficients in \theta are set to zero to ensure compatibility with NB1). Standard errors are derived analytically or numerically using the Hessian matrix returned by optim. See nbreg.control for details.

The returned fitted model object is of class "nbreg" and is similar to fitted "glm" objects.

A set of standard extractor functions for fitted model objects is available for objects of class "nbreg", including methods to the generic functions print, summary, coef, vcov, logLik, residuals, predict, fitted, terms, model.matrix. See predict.nbreg for more details on all methods.

Value

An object of class "nbreg", i.e., a list with components including

coefficients

a vector containing the coefficients from the mean,

coefficients.theta

a vector containing the coefficients from the dispersion parameter theta,

residuals

a vector of raw residuals (observed - fitted),

fitted.values

a vector of fitted means,

optim

a list with the output from the optim call for maximizing the log-likelihood,

control

the control arguments passed to the optim call,

start

the starting values for the parameters passed to the optim call,

weights

the case weights used,

offset

a list with elements "mu" and "theta" containing the offset vectors (if any) from the respective parameters,

n

number of observations (with weights > 0),

df.null

residual degrees of freedom for the null model,

df.residual

residual degrees of freedom for fitted model,

terms

a list with elements "mu", "theta" and "full" containing the terms objects for the respective parameters,

SE.logtheta

standard error for \log(\theta),

loglik

log-likelihood of the fitted model,

vcov

covariance matrix of all coefficients in the model (derived from the analytical Hessian (hessA = TRUE) or from the Hessian of the optim output (hessA = FALSE)),

dist

character string describing the type of NB distribution used,

link

character string describing the link of the mean,

link.theta

character string describing the link of the dispersion parameter theta,

converged

logical indicating successful convergence of optim,

call

the original function call,

formula

the original formula,

levels

levels of the categorical regressors,

contrasts

a list with elements "mu" and "theta" containing the contrasts corresponding to levels from the respective parts,

model

the full model frame (if model = TRUE),

y

the response count vector (if y = TRUE),

x

the model matrix for the mean (if x = TRUE),

z

the model matrix for the mean (if z = TRUE),

References

Cameron AC, Trivedi PK (2013). Regression Analysis of Count Data, 2nd ed. New York: Cambridge University Press.

See Also

nbreg.control, glm, glm.fit, glm.nb,

Examples

data("CrabSatellites", package = "countreg")

## NB2
fm_nb2 <- nbreg(satellites ~ width + color, data = CrabSatellites)

## NB1
fm_nb1 <- nbreg(satellites ~ width + color, data = CrabSatellites, dist = "NB1")

## NBH
fm_nbh <- nbreg(satellites ~ width + color | weight, data = CrabSatellites)

## NB1 with variable theta
fm_nb1h <- nbreg(satellites ~ width + color | weight, data = CrabSatellites,
                dist = "NB1")

## Example not run:
## data
# data("GSOEP", package = "countreg")
# gsoep <- subset(GSOEP, year == "1984")

## NB2
# fm_nb2 <- nbreg(docvis ~ educ + public + addon,
#                 data = gsoep)
                
## NB1
# fm_nb1 <- nbreg(docvis ~ educ + public + addon,
#                 data = gsoep, dist = "NB1")                

## NBH
# fm_nbh <- nbreg(docvis ~ educ + public + addon | married + public,
#                 data = gsoep)
                
## NB1 with variable theta
# fm_nb1h <- nbreg(docvis ~ educ + public + addon | married + public,
#                 data = gsoep, dist = "NB1")

countreg documentation built on Dec. 4, 2023, 3:09 a.m.