negbin1: Negative Binomial 1 Regression
In negbin1: Negative Binomial 1 Regression

Description Usage Arguments Details Value References See Also Examples

Fit negative binomial (NB) regression models with NB1 response distribution. The negative binomial distribution can arise as a gamma mixture of Poisson distributions, and can be used for modelling over-dispersed count data.

negbin1(formula, data, subset, na.action,
  model = TRUE, y = TRUE, x = TRUE,
  control = negbin1_control(...), ...)

negbin1_fit(x, y, control)

negbin1_control(maxit = 5000, start = NULL, grad = TRUE, hessian = TRUE, ...)

`formula`	an object of class "`formula`": a symbolic description of the model to be fitted.
`data`	a data frame containing the variables of the model. If not found in the data, variables are taken from `environment(formula)`

.

`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain NAs.
`model, y, x`	logicals. If TRUE the corresponding components of the fit are returned.
`control`	a list of parameters for controlling the fitting process. For `negbin1_fit` this is passed to `negbin1_control`.
`...`	additional arguments to be passed.
`maxit, start`	control arguments passed to `optim`.
`grad`	logical. Should gradients be used for optimization? If `TRUE`, the default `method` is `"BFGS"`. Otherwise `method = "Nelder-Mead"` is used.
`hessian`	logical or character. Should a numeric approximation of the (negative) Hessian matrix be computed? Either `FALSE` (or equivalently `"none"`) or `TRUE`. Alternatively, in the latter case, `hessian = "numDeriv"` could be specified to signal that the Hessian should be approximated by `hessian`. Another option is `hessian = "numDeriv"` so that `optim` is used for computing the Hessian.

A continous mixture of Poisson distribution with Gamma weights is called negative binomial distribution. To employ the NB1 distribution in a regression, the natural mean equation is employed:

log(μ_{i}) = x_i^\top β

.

For the variance, the NB1 model assumes varying θ_i such that α = μ_i/θ_i and thus

Var(y_i | x_i) = (1 + α) \cdot μ_i

.

The workhorse function is negbin1_fit, which is normally not called directly, but when the model response and model matrix have already been calculated. Starting values in the optimization are by default taken from a Poisson glm.

negbin1_fit is the lower level function where the actual fitting takes place.

A set of standard extractor functions for fitted model objects is available for objects of class "negbin1", including methods to the generic functions print, summary, coef, vcov, logLik, residuals, predict, terms, model.frame, model.matrix, update, estfun and bread (from the sandwich package), and getSummary (from the memisc package, enabling mtable).

See predict.negbin1 and coef.negbin1 for more details on some methods with non-standard arguments.

negbin1 returns an object of class "negbin", i.e., a list with components as follows. negbin1.fit returns an unclassed list with components up to df.

`coefficients`	a vector containing the coefficients from the respective models,
`counts`	count of function and gradient evaluations from `optim`,
`convergence`	convergence code from `optim`,
`message`	optional further information from `optim`,
`vcov`	covariance matrix of all parameters in the model,
`residuals`	a vector of raw residuals (observed - fitted),
`fitted.values`	a list with elements `"location"` and `"alpha"` containing the latent fitted means and standard deviations,
`method`	the method argument passed to the `optim` call,
`nobs`	number of observations,
`df`	number of estimated parameters,
`call`	the original function call,
`formula`	the original formula,
`terms`	a list with elements containing the terms objects for the respective models,
`levels`	a list with elements containing the levels of the categorical regressors,
`contrasts`	a list with elements containing the contrasts corresponding to `levels` from the respective models,
`model`	the full model frame (if `model = TRUE`),
`y`	the numeric response vector (if `y = TRUE`),
`x`	a list with elements containing the model matrices from the respective models (if `x = TRUE`).

Cameron AC & Trivedi PK (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests, Journal of Applied Econometrics, 1, 29–53.

Cameron AC & Trivedi PK (2013). “Regression Analysis of Count Data”, Cambridge University Press.

Lawless JF (1987). Negative Binomial and Mixed Poisson Regression, The Canadian Journal of Statistics, 15(3), 209–225.

Winkelmann R & Boes S (2009). “Analysis of Microdata”, Springer, Second Edition.

predict.negbin1, coef.negbin1, gamlss, vglm

## packages
require("Formula")
require("gamlss")
require("VGAM")
require("lmtest")

## data generating process
dgp <- function(n = 1000, coef = c(0.2, 0.3, 0, 2)) {
  d <- data.frame(
    x1 = runif(n, -1, 1),
    x2 = runif(n, -1, 1)
    )
  d$mu <- exp(coef[1] + coef[2] * d$x1 + coef[3] * d$x2)
  d$y <- rnbinom(n, mu = d$mu, size = d$mu / coef[4])
  return(d)
}

## simulate data
set.seed(2007-05-15)
d <- dgp()

## model (with function negbin1)
nb1 <- negbin1(y ~ x1 + x2, data = d)
summary(nb1)
    
## model (with vglm from VGAM package)
nb2 <- vglm(y ~ x1 + x2, negbinomial(parallel = TRUE, zero = NULL), data = d, trace = TRUE)    
summary(nb2)

## model (with gamlss from gamlss)
nb3 <- gamlss(y ~ x1 + x2, family = NBII(sigma.link = "identity"), data = d)
summary(nb3)

## comparison of coefficient vectors
cbind("negbin1" = coef(nb1), "vglm" = coef(nb2)[c(1,3,4)], "gamlss" = coef(nb3))

## comparison of log likelihoods
cbind("negbin1" = logLik(nb1), "vglm" = logLik(nb2), "gamlss" = logLik(nb3))

## model comparison
nb0 <- negbin1(y ~ x1, data = d)
AIC(nb0, nb1)
BIC(nb0, nb1)
lrtest(nb0, nb1)

## comparison of alpha
co <- coef(nb2, matrix = TRUE)
diff <- 1/ (exp(co["(Intercept)", "loge(size)"] - co["(Intercept)", "loge(mu)"])) 

cbind("negbin1" = nb1$coefficients$alpha, "vglm" = diff, "gamlss" = nb3$sigma.coefficients)

## benefit of formula/terms interface
update(nb1, subset = x2 > 0)
head(model.frame(nb1))
head(model.matrix(nb1))