zigeometric: Zero-Inflated Geometric Distribution Family Function

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/family.zeroinf.R

Description

Fits a zero-inflated geometric distribution by maximum likelihood estimation.

Usage

1
2
3
4
5
6
7
8
zigeometric(lpstr0  = "logitlink", lprob = "logitlink",
            type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
            ipstr0  = NULL, iprob = NULL,
            imethod = 1, bias.red = 0.5, zero = NULL)
zigeometricff(lprob = "logitlink", lonempstr0 = "logitlink",
              type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
              iprob = NULL, ionempstr0 = NULL,
              imethod = 1, bias.red = 0.5, zero = "onempstr0")

Arguments

lpstr0, lprob

Link functions for the parameters phi and prob (prob). The usual geometric probability parameter is the latter. The probability of a structural zero is the former. See Links for more choices. For the zero-deflated model see below.

lonempstr0, ionempstr0

Corresponding arguments for the other parameterization. See details below.

bias.red

A constant used in the initialization process of pstr0. It should lie between 0 and 1, with 1 having no effect.

type.fitted

See CommonVGAMffArguments and fittedvlm for information.

ipstr0, iprob

See CommonVGAMffArguments for information.

zero, imethod

See CommonVGAMffArguments for information.

Details

Function zigeometric() is based on

P(Y=0) = phi + (1-phi) * prob,

for y=0, and

P(Y=y) = (1-phi) * prob * (1 - prob)^y.

for y=1,2,…. The parameter phi satisfies 0 < phi < 1. The mean of Y is E(Y) = (1-phi) * prob / (1-prob) and these are returned as the fitted values by default. By default, the two linear/additive predictors are (logit(phi), logit(prob))^T. Multiple responses are handled.

Estimated probabilities of a structural zero and an observed zero can be returned, as in zipoisson; see fittedvlm for information.

The VGAM family function zigeometricff() has a few changes compared to zigeometric(). These are: (i) the order of the linear/additive predictors is switched so the geometric probability comes first; (ii) argument onempstr0 is now 1 minus the probability of a structural zero, i.e., the probability of the parent (geometric) component, i.e., onempstr0 is 1-pstr0; (iii) argument zero has a new default so that the onempstr0 is intercept-only by default. Now zigeometricff() is generally recommended over zigeometric(). Both functions implement Fisher scoring and can handle multiple responses.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

Note

The zero-deflated geometric distribution might be fitted by setting lpstr0 = identitylink, albeit, not entirely reliably. See zipoisson for information that can be applied here. Else try the zero-altered geometric distribution (see zageometric).

Author(s)

T. W. Yee

See Also

rzigeom, geometric, zageometric, spikeplot, rgeom, simulate.vlm.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
gdata <- data.frame(x2 = runif(nn <- 1000) - 0.5)
gdata <- transform(gdata, x3 = runif(nn) - 0.5,
                          x4 = runif(nn) - 0.5)
gdata <- transform(gdata, eta1 =  1.0 - 1.0 * x2 + 2.0 * x3,
                          eta2 = -1.0,
                          eta3 =  0.5)
gdata <- transform(gdata, prob1 = logitlink(eta1, inverse = TRUE),
                          prob2 = logitlink(eta2, inverse = TRUE),
                          prob3 = logitlink(eta3, inverse = TRUE))
gdata <- transform(gdata, y1 = rzigeom(nn, prob1, pstr0 = prob3),
                          y2 = rzigeom(nn, prob2, pstr0 = prob3),
                          y3 = rzigeom(nn, prob2, pstr0 = prob3))
with(gdata, table(y1))
with(gdata, table(y2))
with(gdata, table(y3))
head(gdata)

fit1 <- vglm(y1 ~ x2 + x3 + x4, zigeometric(zero = 1), data = gdata, trace = TRUE)
coef(fit1, matrix = TRUE)
head(fitted(fit1, type = "pstr0"))

fit2 <- vglm(cbind(y2, y3) ~ 1, zigeometric(zero = 1), data = gdata, trace = TRUE)
coef(fit2, matrix = TRUE)
summary(fit2)

Example output

Loading required package: stats4
Loading required package: splines
y1
  0   1   2   3   4   8 
896  64  25   9   5   1 
y2
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  15  17 
725  75  54  32  28  22  17  14   9  12   1   5   1   2   2   1 
y3
  0   1   2   3   4   5   6   7   8   9  10  11  12  15  20  22 
720  81  48  37  35  21  17  13   6   9   5   2   3   1   1   1 
          x2         x3         x4        eta1 eta2 eta3     prob1     prob2
1  0.2759165  0.4252369 -0.4447786  1.57455726   -1  0.5 0.8284323 0.2689414
2  0.3002975 -0.2237458 -0.0110400  0.25221077   -1  0.5 0.5627206 0.2689414
3 -0.1287799 -0.3290780  0.1334256  0.47062393   -1  0.5 0.6155314 0.2689414
4  0.1243307 -0.4704042  0.4571114 -0.06513904   -1  0.5 0.4837210 0.2689414
5 -0.2342292 -0.4377276  0.3268040  0.35877404   -1  0.5 0.5887436 0.2689414
6 -0.2213717  0.1847765 -0.3376931  1.59092479   -1  0.5 0.8307462 0.2689414
      prob3 y1 y2 y3
1 0.6224593  0  0  1
2 0.6224593  0  0  0
3 0.6224593  1  1  0
4 0.6224593  0  0  0
5 0.6224593  0  0  9
6 0.6224593  0  0  0
VGLM    linear loop  1 :  loglikelihood = -459.30618
VGLM    linear loop  2 :  loglikelihood = -438.92135
VGLM    linear loop  3 :  loglikelihood = -427.7268
VGLM    linear loop  4 :  loglikelihood = -425.66587
VGLM    linear loop  5 :  loglikelihood = -425.61744
VGLM    linear loop  6 :  loglikelihood = -425.61718
VGLM    linear loop  7 :  loglikelihood = -425.61717
VGLM    linear loop  8 :  loglikelihood = -425.61717
            logitlink(pstr0) logitlink(prob)
(Intercept)        0.5621004       0.9864963
x2                 0.0000000      -1.2080077
x3                 0.0000000       2.0907374
x4                 0.0000000       0.4878864
          [,1]
[1,] 0.6369384
[2,] 0.6369384
[3,] 0.6369384
[4,] 0.6369384
[5,] 0.6369384
[6,] 0.6369384
VGLM    linear loop  1 :  loglikelihood = -2379.6966
VGLM    linear loop  2 :  loglikelihood = -2378.5318
VGLM    linear loop  3 :  loglikelihood = -2378.5289
VGLM    linear loop  4 :  loglikelihood = -2378.5289
            logitlink(pstr01) logitlink(prob1) logitlink(pstr02)
(Intercept)         0.5106845         -1.01127         0.4643856
            logitlink(prob2)
(Intercept)       -0.9718606

Call:
vglm(formula = cbind(y2, y3) ~ 1, family = zigeometric(zero = 1), 
    data = gdata, trace = TRUE)

Pearson residuals:
                      Min      1Q Median     3Q    Max
logitlink(pstr01)  -1.909 -1.1051 0.6043 0.6043 0.6043
logitlink(prob1)   -8.036  0.1190 0.1190 0.1190 1.2883
logitlink(pstr02)  -1.895 -1.1595 0.6109 0.6109 0.6782
logitlink(prob2)  -11.275  0.1254 0.1254 0.1254 1.2542

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  0.51068    0.08748   5.838 5.30e-09 ***
(Intercept):2 -1.01127    0.07042 -14.360  < 2e-16 ***
(Intercept):3  0.46439    0.08834   5.257 1.46e-07 ***
(Intercept):4 -0.97186    0.07016 -13.852  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: logitlink(pstr01), logitlink(prob1), 
logitlink(pstr02), logitlink(prob2)

Log-likelihood: -2378.529 on 3996 degrees of freedom

Number of Fisher scoring iterations: 4 

No Hauck-Donner effect found in any of the estimates

VGAM documentation built on Jan. 16, 2021, 5:21 p.m.