extbetabinomial: Extended Beta-binomial Distribution Family Function
In VGAM: Vector Generalized Linear and Additive Models

extbetabinomial

R Documentation

Extended Beta-binomial Distribution Family Function

Description

Fits an extended beta-binomial distribution by maximum likelihood estimation. The two parameters here are the mean and correlation coefficient.

Usage

extbetabinomial(lmu = "logitlink", lrho = "cloglink", 
      zero = "rho", irho = 0, grho = c(0, 0.05, 0.1, 0.2),
      vfl = FALSE, Form2 = NULL,
      imethod = 1, ishrinkage = 0.95)

Arguments

`lmu`, `lrho`	Link functions applied to the two parameters. See `Links` for more choices. The first default ensure the mean remain in `(0, 1)`, while the second allows for a slightly negative correlation parameter: you could say it lies in `(\max(-\mu/(N-\mu-1), -(1 - \mu)/(N-(1-\mu)-1)), 1)` where `\mu` is the mean (probability) and `N` is `size`. See below for details. For `lrho`, `cloglink` is a good choice because it handles parameter values from 1 downwards. Other good choices include `logofflink(offset = 1)` and `rhobitlink`.
`irho`, `grho`	The first is similar to `betabinomial` and it is a good idea to use this argument because to conduct a grid search based on `grho` is expensive. The default is effectively a binomial distribution. Set `irho = NULL` to perform a grid search which is more reliable but slow.
`imethod`	Similar to `betabinomial`.
`zero`	Similar to `betabinomial`. Also, see `CommonVGAMffArguments` for more information. Modelling `rho` with covariates requires large samples.
`ishrinkage`	See `betabinomial` and `CommonVGAMffArguments` for information.
`vfl`, `Form2`	See `CommonVGAMffArguments`. If `vfl = TRUE` then `Form2` should be a formula specifying the terms for `\eta_2` and all others are used for `\mu`. It is similar to `uninormal`. If these arguments are used then `cbind(0, log(size1 / (size1 - 1)))` should be used as an offset, and set `zero = NULL` too.

Details

The extended beta-binomial distribution (EBBD) proposed by Prentice (1986) allows for a slightly negative correlation parameter whereas the ordinary BBD betabinomial only allows values in (0, 1) so it handles overdispersion only. When negative, the data is underdispersed relative to an ordinary binomial distribution.

Argument rho is used here for the \delta used in Prentice (1986) because it is the correlation between the (almost) Bernoulli trials. (They are actually simple binary variates.) We use here N for the number of trials (e.g., litter size), T=NY is the number of successes, and p (or \mu) is the probability of a success (e.g., a malformation). That is, Y is the proportion of successes. Like binomialff, the fitted values are the estimated probability of success (i.e., E[Y] and not E[T]) and the prior weights N are attached separately on the object in a slot.

The probability function is difficult to write but it involves three series of products. Recall Y = T/N is the real response being modelled, where T is the (total) sum of N correlated (almost) Bernoulli trials.

The default model is \eta_1 = logit(\mu) and \eta_2 = clog(\rho) because the first parameter lies between 0 and 1. The second link is cloglink. The mean of Y is p=\mu and the variance of Y is \mu(1-\mu)(1+(N-1)\rho)/N. Here, the correlation \rho may be slightly negative and is the correlation between the N individuals within a litter. A litter effect is typically reflected by a positive value of \rho and corresponds to overdispersion with respect to the binomial distribution. Thus an exchangeable error structure is assumed between units within a litter for the EBBD.

This family function uses Fisher scoring. Elements of the second-order expected derivatives are computed numerically, which may fail for models very near the boundary of the parameter space. Usually, the computations are expensive for large N because of a for loop, so it may take a long time.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm.

Suppose fit is a fitted EBB model. Then depvar(fit) are the sample proportions y, fitted(fit) returns estimates of E(Y), and weights(fit, type = "prior") returns the number of trials N.

Warning

Modelling rho using covariates well requires much data so it is usually best to leave zero alone. It is good to set trace = TRUE and play around with irho if there are problems achieving convergence. Convergence problems will occur when the estimated rho is close to the lower bound, i.e., the underdispersion is almost too severe for the EBB to cope.

Note

This function is recommended over betabinomial and betabinomialff. It processes the input in the same way as binomialff. But it does not handle the case N \leq 2 very well because there are two parameters to estimate, not one, for each row of the input. Cases where N > 2 can be selected via the subset argument of vglm.

Author(s)

T. W. Yee

References

Prentice, R. L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321–327.

Examples

# Example 1
edata <- data.frame(N = 10, mu = 0.5, rho = 0.1)
edata <- transform(edata,
      y = rextbetabinom(100, N, mu, rho = rho))
fit1 <- vglm(cbind(y, N-y) ~ 1, extbetabinomial, edata, trace = TRUE)
coef(fit1, matrix = TRUE)
Coef(fit1)
head(cbind(depvar(fit1), weights(fit1, type = "prior")))

# Example 2: VFL model
## Not run: N <- size1 <- 10; nn <- 2000; set.seed(1)
edata <-  # Generate the data set. Expensive.
    data.frame(x2 = runif(nn),
               ooo =  log(size1 / (size1 - 1)))
edata <- transform(edata, x1copy = 1, x2copy = x2,
  y2 = rextbetabinom(nn, size1,  # Expensive
         logitlink(1 + x2, inverse = TRUE),
         cloglink(ooo + 1 - 0.5 * x2, inv = TRUE)))
fit2 <- vglm(data = edata,
        cbind(y2, N - y2) ~ x2 + x1copy + x2copy,
        extbetabinomial(zero = NULL, vfl = TRUE,
                 Form2 = ~ x1copy + x2copy - 1),
        offset = cbind(0, ooo), trace = TRUE)
coef(fit2, matrix = TRUE)
wald.stat(fit2, values0 = c(1, 1, -0.5))

## End(Not run)

VGAM documentation built on Sept. 18, 2024, 9:09 a.m.