gaitdnbinomial: Generally Altered, Inflated, Truncated and Deflated Negative...

View source: R/family.gaitd.R

gaitdnbinomialR Documentation

Generally Altered, Inflated, Truncated and Deflated Negative Binomial Regression

Description

Fits a generally altered, inflated truncated and deflated negative binomial regression by MLE. The GAITD combo model having 7 types of special values is implemented. This allows mixtures of negative binomial distributions on nested and/or partitioned support as well as a multinomial logit model for (nonparametric) altered, inflated and deflated values.

Usage

 gaitdnbinomial(a.mix = NULL, i.mix = NULL, d.mix = NULL,
     a.mlm = NULL, i.mlm = NULL, d.mlm = NULL,
     truncate = NULL, zero = c("size", "pobs", "pstr", "pdip"),
     eq.ap = TRUE, eq.ip = TRUE, eq.dp = TRUE,
     parallel.a = FALSE, parallel.i = FALSE, parallel.d = FALSE,
     lmunb.p = "loglink",
     lmunb.a = lmunb.p, lmunb.i = lmunb.p, lmunb.d = lmunb.p,
     lsize.p = "loglink",
     lsize.a = lsize.p, lsize.i = lsize.p, lsize.d = lsize.p,
     type.fitted = c("mean", "munbs", "sizes", "pobs.mlm",
     "pstr.mlm", "pdip.mlm", "pobs.mix", "pstr.mix", "pdip.mix",
     "Pobs.mix", "Pstr.mix", "Pdip.mix", "nonspecial", "Numer",
     "Denom.p", "sum.mlm.i", "sum.mix.i",
     "sum.mlm.d", "sum.mix.d", "ptrunc.p", "cdf.max.s"),
     gpstr.mix = ppoints(7) / 3,
     gpstr.mlm = ppoints(7) / (3 + length(i.mlm)),
     imethod = 1, mux.init = c(0.75, 0.5, 0.75, 0.5),
     imunb.p = NULL, imunb.a = imunb.p,
     imunb.i = imunb.p, imunb.d = imunb.p,
     isize.p = NULL,  isize.a = isize.p,
     isize.i = isize.p, isize.d = isize.p,
     ipobs.mix = NULL, ipstr.mix = NULL,
     ipdip.mix = NULL, ipobs.mlm = NULL,
     ipstr.mlm = NULL, ipdip.mlm = NULL,
     byrow.aid = FALSE, ishrinkage = 0.95, probs.y = 0.35,
     nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-7,
     nbd.max.support = 4000, max.chunk.MB = 30)

Arguments

truncate

See gaitdpoisson.

a.mix, i.mix, d.mix

See gaitdpoisson.

a.mlm, i.mlm, d.mlm

See gaitdpoisson.

lmunb.p, lmunb.a, lmunb.i, lmunb.d

Link functions pertaining to the mean parameters. See gaitdpoisson where llambda.p etc. are the equivalent.

lsize.p, lsize.a, lsize.i, lsize.d

Link functions pertaining to the size parameters. See NegBinomial.

eq.ap, eq.ip, eq.dp

See gaitdpoisson. These apply to both munb and size parameters simultaneously. See NegBinomial also.

parallel.a, parallel.i, parallel.d

See gaitdpoisson.

type.fitted

See gaitdpoisson.

gpstr.mix, gpstr.mlm

See gaitdpoisson.

imethod, ipobs.mix, ipstr.mix, ipdip.mix

See gaitdpoisson and CommonVGAMffArguments.

ipobs.mlm, ipstr.mlm, ipdip.mlm

See gaitdpoisson.

mux.init

Numeric, of length 4. General downward multiplier for initial values for the sample proportions (MLEs actually). See gaitdpoisson. The fourth value corresponds to size.

imunb.p, imunb.a, imunb.i, imunb.d

See gaitdpoisson; imunb.p is similar to ilambda.p, etc.

isize.p, isize.a, isize.i, isize.d

See gaitdpoisson; isize.p is similar to ilambda.p, etc.

probs.y, ishrinkage

See CommonVGAMffArguments for information.

byrow.aid

Details are at Gaitdpois.

zero

See gaitdpoisson and CommonVGAMffArguments.

nsimEIM, cutoff.prob, eps.trig

See negbinomial.

nbd.max.support, max.chunk.MB

See negbinomial.

Details

The GAITD–NB combo model is the pinnacle of GAITD regression for counts because it potentially handles underdispersion, equidispersion and overdispersion relative to the Poisson, as well as alteration, inflation, deflation and truncation at arbitrary support points. In contrast, gaitdpoisson cannot handle overdispersion so well. The GAITD–NB is so flexible that it can accommodate up to seven modes.

The full GAITD–NB–NB–MLM–NB-MLM–NB-MLM combo model may be fitted with this family function. There are seven types of special values and all arguments for these may be used in a single model. Here, the MLM represents the nonparametric while the NB refers to the negative binomial mixtures. The defaults for this function correspond to an ordinary negative binomial regression so that negbinomial is called instead.

While much of the documentation here draws upon gaitdpoisson, there are additional details here because the NBD is a two parameter distribution that handles overdispersion relative to the Possion. Consequently, this family function is exceeding flexible and there are many more pitfalls to avoid.

The order of the linear/additive predictors is best explained by an example. Suppose a combo model has length(a.mix) > 3 and length(i.mix) > 3, length(d.mix) > 3, a.mlm = 3:5, i.mlm = 6:9 and d.mlm = 10:12, say. Then loglink(munb.p) and loglink(size.p) are the first two. The third is multilogitlink(pobs.mix) followed by loglink(munb.a) and loglink(size.a) because a.mix is long enough. The sixth is multilogitlink(pstr.mix) followed by loglink(munb.i) and loglink(size.i) because i.mix is long enough. The ninth is multilogitlink(pdip.mix) followed by loglink(munb.d) and loglink(size.d) because d.mix is long enough. Next are the probabilities for the a.mlm values. Then are the probabilities for the i.mlm values. Lastly are the probabilities for the d.mlm values. All the probabilities are estimated by one big MLM and effectively the "(Others)" column of left over probabilities is associated with the nonspecial values. These might be called the nonspecial baseline probabilities (NBP) or reserve probabilities. The dimension of the vector of linear/additive predictors here is M=21.

Apart from the order of the linear/additive predictors, the following are (or should be) equivalent: gaitdnbinomial() and negbinomial(), gaitdnbinomial(a.mix = 0) and zanegbinomial(zero = "pobs0"), gaitdnbinomial(i.mix = 0) and zinegbinomial(zero = "pstr0"), gaitdnbinomial(truncate = 0) and posnegbinomial(). Likewise, if a.mix and i.mix are assigned a scalar then it effectively moves that scalar to a.mlm and i.mlm because there is no parameters such as munb.i being estimated. Thus gaitdnbinomial(a.mix = 0) and gaitdnbinomial(a.mlm = 0) are the effectively same, and ditto for gaitdnbinomial(i.mix = 0) and gaitdnbinomial(i.mlm = 0).

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

The fitted.values slot of the fitted object, which should be extracted by the generic function fitted, returns the mean \mu by default. See the information above on type.fitted.

Warning

See gaitdpoisson. Also, having eq.ap = TRUE, eq.ip = TRUE and eq.dp = TRUE is often needed to obtain initial values that are good enough because they borrow strength across the different operators. It is usually easy to relax these assumptions later.

This family function is under constant development and future changes will occur.

Note

If length(a.mix) is 1 then effectively this becomes a value of a.mlm. If length(a.mix) is 2 then an error message will be issued (overfitting really). If length(a.mix) is 3 then this is almost overfitting too. Hence length(a.mix) should be 4 or more. Ditto for length(i.mix) and length(d.mix).

See gaitdpoisson for notes about numerical problems that can easily arise. With the NBD there is even more potential trouble that can occur. In particular, good initial values are more necessary so it pays to experiment with arguments such as imunb.p and isize.p, as well as fitting an intercept-only model first before adding covariates and using etastart.

Currently max.support is missing because only Inf is handled. This might change later.

Author(s)

T. W. Yee

References

Yee, T. W. and Ma, C. (2024). Generally altered, inflated, truncated and deflated regression. Statistical Science, 39 (in press).

See Also

Gaitdnbinom, dgaitdplot, multinomial, rootogram4, specials, plotdgaitd, spikeplot, meangaitd, KLD, gaitdpoisson, gaitdlog, gaitdzeta, multilogitlink, multinomial, goffset, Trunc, negbinomial, CommonVGAMffArguments, simulate.vlm.

Examples

## Not run: 
i.mix <- c(5, 10, 12, 16)  # Inflate these values parametrically
i.mlm <- c(14, 15)  # Inflate these values
a.mix <- c(1, 6, 13, 20)  # Alter these values
tvec <- c(3, 11)   # Truncate these values
pstr.mlm <- 0.1  # So parallel.i = TRUE
pobs.mix <- pstr.mix <- 0.1; set.seed(1)
gdata <- data.frame(x2 = runif(nn <- 1000))
gdata <- transform(gdata, munb.p = exp(2 + 0.0 * x2),
                   size.p = exp(1))
gdata <- transform(gdata,
  y1 = rgaitdnbinom(nn, size.p, munb.p, a.mix = a.mix,
                    i.mix = i.mix,
                    pobs.mix = pobs.mix, pstr.mix = pstr.mix,
                    i.mlm = i.mlm, pstr.mlm = pstr.mlm,
                    truncate = tvec))
gaitdnbinomial(a.mix = a.mix, i.mix = i.mix, i.mlm = i.mlm)
with(gdata, table(y1))
fit1 <- vglm(y1 ~ 1, crit = "coef", trace = TRUE, data = gdata,
             gaitdnbinomial(a.mix = a.mix, i.mix = i.mix,
                            i.mlm = i.mlm,
                            parallel.i = TRUE, eq.ap = TRUE,
                            eq.ip = TRUE, truncate = tvec))
head(fitted(fit1, type.fitted = "Pstr.mix"))
head(predict(fit1))
t(coef(fit1, matrix = TRUE))  # Easier to see with t()
summary(fit1)
spikeplot(with(gdata, y1), lwd = 2)
plotdgaitd(fit1, new.plot = FALSE, offset.x = 0.2, all.lwd = 2)  
## End(Not run)

VGAM documentation built on Sept. 18, 2024, 9:09 a.m.