gaitdnbinomial | R Documentation |
Fits a generally altered, inflated truncated and deflated negative binomial regression by MLE. The GAITD combo model having 7 types of special values is implemented. This allows mixtures of negative binomial distributions on nested and/or partitioned support as well as a multinomial logit model for (nonparametric) altered, inflated and deflated values.
gaitdnbinomial(a.mix = NULL, i.mix = NULL, d.mix = NULL,
a.mlm = NULL, i.mlm = NULL, d.mlm = NULL,
truncate = NULL, zero = c("size", "pobs", "pstr", "pdip"),
eq.ap = TRUE, eq.ip = TRUE, eq.dp = TRUE,
parallel.a = FALSE, parallel.i = FALSE, parallel.d = FALSE,
lmunb.p = "loglink",
lmunb.a = lmunb.p, lmunb.i = lmunb.p, lmunb.d = lmunb.p,
lsize.p = "loglink",
lsize.a = lsize.p, lsize.i = lsize.p, lsize.d = lsize.p,
type.fitted = c("mean", "munbs", "sizes", "pobs.mlm",
"pstr.mlm", "pdip.mlm", "pobs.mix", "pstr.mix", "pdip.mix",
"Pobs.mix", "Pstr.mix", "Pdip.mix", "nonspecial", "Numer",
"Denom.p", "sum.mlm.i", "sum.mix.i",
"sum.mlm.d", "sum.mix.d", "ptrunc.p", "cdf.max.s"),
gpstr.mix = ppoints(7) / 3,
gpstr.mlm = ppoints(7) / (3 + length(i.mlm)),
imethod = 1, mux.init = c(0.75, 0.5, 0.75, 0.5),
imunb.p = NULL, imunb.a = imunb.p,
imunb.i = imunb.p, imunb.d = imunb.p,
isize.p = NULL, isize.a = isize.p,
isize.i = isize.p, isize.d = isize.p,
ipobs.mix = NULL, ipstr.mix = NULL,
ipdip.mix = NULL, ipobs.mlm = NULL,
ipstr.mlm = NULL, ipdip.mlm = NULL,
byrow.aid = FALSE, ishrinkage = 0.95, probs.y = 0.35,
nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-7,
nbd.max.support = 4000, max.chunk.MB = 30)
truncate |
See |
a.mix , i.mix , d.mix |
See |
a.mlm , i.mlm , d.mlm |
See |
lmunb.p , lmunb.a , lmunb.i , lmunb.d |
Link functions pertaining to the mean parameters.
See |
lsize.p , lsize.a , lsize.i , lsize.d |
Link functions pertaining to the |
eq.ap , eq.ip , eq.dp |
See |
parallel.a , parallel.i , parallel.d |
See |
type.fitted |
See |
gpstr.mix , gpstr.mlm |
See |
imethod , ipobs.mix , ipstr.mix , ipdip.mix |
See |
ipobs.mlm , ipstr.mlm , ipdip.mlm |
See |
mux.init |
Numeric, of length 4.
General downward multiplier for initial values for
the sample proportions (MLEs actually).
See |
imunb.p , imunb.a , imunb.i , imunb.d |
See |
isize.p , isize.a , isize.i , isize.d |
See |
probs.y , ishrinkage |
See |
byrow.aid |
Details are at |
zero |
See |
nsimEIM , cutoff.prob , eps.trig |
See |
nbd.max.support , max.chunk.MB |
See |
The GAITD–NB combo model is the pinnacle of GAITD regression
for counts because it potentially handles
underdispersion,
equidispersion and
overdispersion relative to the Poisson,
as well as
alteration,
inflation,
deflation and
truncation at arbitrary support points.
In contrast, gaitdpoisson
cannot handle
overdispersion so well.
The GAITD–NB is so flexible that it can accommodate up to
seven modes.
The full
GAITD–NB–NB–MLM–NB-MLM–NB-MLM combo model
may be fitted with this family function.
There are seven types of special values and all
arguments for these
may be used in a single model.
Here, the MLM represents the nonparametric while the NB
refers to the negative binomial mixtures.
The defaults for this function correspond to an
ordinary negative binomial
regression so that negbinomial
is called instead.
While much of the documentation here draws upon
gaitdpoisson
, there are additional
details here because the NBD is a two parameter
distribution that handles overdispersion relative
to the Possion.
Consequently, this family function is exceeding flexible
and there are many more pitfalls to avoid.
The order of the linear/additive predictors is
best explained by an example.
Suppose a combo model has
length(a.mix) > 3
and
length(i.mix) > 3
,
length(d.mix) > 3
,
a.mlm = 3:5
,
i.mlm = 6:9
and
d.mlm = 10:12
, say.
Then loglink(munb.p)
and loglink(size.p)
are the first two.
The third is multilogitlink(pobs.mix)
followed
by loglink(munb.a)
and loglink(size.a)
because a.mix
is long enough.
The sixth is multilogitlink(pstr.mix)
followed
by loglink(munb.i)
and loglink(size.i)
because i.mix
is long enough.
The ninth is multilogitlink(pdip.mix)
followed
by loglink(munb.d)
and loglink(size.d)
because d.mix
is long enough.
Next are the probabilities for the a.mlm
values.
Then are the probabilities for the i.mlm
values.
Lastly are the probabilities for the d.mlm
values.
All the probabilities are estimated by one big MLM
and effectively
the "(Others)"
column of left over probabilities is
associated with the nonspecial values.
These might be called the
nonspecial baseline probabilities (NBP)
or reserve probabilities.
The dimension of the vector of linear/additive predictors here
is M=21
.
Apart from the order of the linear/additive predictors,
the following are (or should be) equivalent:
gaitdnbinomial()
and negbinomial()
,
gaitdnbinomial(a.mix = 0)
and zanegbinomial(zero = "pobs0")
,
gaitdnbinomial(i.mix = 0)
and zinegbinomial(zero = "pstr0")
,
gaitdnbinomial(truncate = 0)
and posnegbinomial()
.
Likewise, if
a.mix
and i.mix
are assigned a scalar then
it effectively moves that scalar to a.mlm
and i.mlm
because there is no
parameters such as munb.i
being estimated.
Thus
gaitdnbinomial(a.mix = 0)
and gaitdnbinomial(a.mlm = 0)
are the effectively same, and ditto for
gaitdnbinomial(i.mix = 0)
and gaitdnbinomial(i.mlm = 0)
.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions
such as vglm
,
rrvglm
and vgam
.
The fitted.values
slot of the fitted object,
which should be extracted by the generic function fitted
,
returns the mean \mu
by default.
See the information above on type.fitted
.
See gaitdpoisson
.
Also, having eq.ap = TRUE
, eq.ip = TRUE
and eq.dp = TRUE
is often needed to obtain
initial values that are good enough because they borrow
strength across the different operators.
It is usually easy to relax these assumptions later.
This family function is under constant development and future changes will occur.
If length(a.mix)
is 1 then effectively this becomes a
value of a.mlm
.
If length(a.mix)
is 2 then an error message
will be issued (overfitting really).
If length(a.mix)
is 3 then this is almost
overfitting too.
Hence length(a.mix)
should be 4 or more.
Ditto for length(i.mix)
and length(d.mix)
.
See gaitdpoisson
for notes about numerical
problems that can easily arise. With the NBD there is
even more potential trouble that can occur.
In particular, good initial values are more necessary so
it pays to experiment with arguments such as
imunb.p
and isize.p
, as well as
fitting an intercept-only model first before adding
covariates and using etastart
.
Currently max.support
is missing because only
Inf
is handled. This might change later.
T. W. Yee
Yee, T. W. and Ma, C. (2024). Generally altered, inflated, truncated and deflated regression. Statistical Science, 39 (in press).
Gaitdnbinom
,
dgaitdplot
,
multinomial
,
rootogram4
,
specials
,
plotdgaitd
,
spikeplot
,
meangaitd
,
KLD
,
gaitdpoisson
,
gaitdlog
,
gaitdzeta
,
multilogitlink
,
multinomial
,
goffset
,
Trunc
,
negbinomial
,
CommonVGAMffArguments
,
simulate.vlm
.
## Not run:
i.mix <- c(5, 10, 12, 16) # Inflate these values parametrically
i.mlm <- c(14, 15) # Inflate these values
a.mix <- c(1, 6, 13, 20) # Alter these values
tvec <- c(3, 11) # Truncate these values
pstr.mlm <- 0.1 # So parallel.i = TRUE
pobs.mix <- pstr.mix <- 0.1; set.seed(1)
gdata <- data.frame(x2 = runif(nn <- 1000))
gdata <- transform(gdata, munb.p = exp(2 + 0.0 * x2),
size.p = exp(1))
gdata <- transform(gdata,
y1 = rgaitdnbinom(nn, size.p, munb.p, a.mix = a.mix,
i.mix = i.mix,
pobs.mix = pobs.mix, pstr.mix = pstr.mix,
i.mlm = i.mlm, pstr.mlm = pstr.mlm,
truncate = tvec))
gaitdnbinomial(a.mix = a.mix, i.mix = i.mix, i.mlm = i.mlm)
with(gdata, table(y1))
fit1 <- vglm(y1 ~ 1, crit = "coef", trace = TRUE, data = gdata,
gaitdnbinomial(a.mix = a.mix, i.mix = i.mix,
i.mlm = i.mlm,
parallel.i = TRUE, eq.ap = TRUE,
eq.ip = TRUE, truncate = tvec))
head(fitted(fit1, type.fitted = "Pstr.mix"))
head(predict(fit1))
t(coef(fit1, matrix = TRUE)) # Easier to see with t()
summary(fit1)
spikeplot(with(gdata, y1), lwd = 2)
plotdgaitd(fit1, new.plot = FALSE, offset.x = 0.2, all.lwd = 2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.