tweedie: Tweedie Generalized Linear Models In statmod: Statistical Modeling

 tweedie R Documentation

Tweedie Generalized Linear Models

Description

Produces a generalized linear model family object with any power variance function and any power link. Includes the Gaussian, Poisson, gamma and inverse-Gaussian families as special cases.

Usage

```tweedie(var.power = 0, link.power = 1 - var.power)
```

Arguments

 `var.power` index of power variance function `link.power` index of power link function. `link.power=0` produces a log-link. Defaults to the canonical link, which is `1-var.power`.

Details

This function provides access to a range of generalized linear model (GLM) response distributions that are not otherwise provided by R. It is also useful for accessing distribution/link combinations that are disallowed by the R `glm` function. The variance function for the GLM is assumed to be V(mu) = mu^var.power, where mu is the expected value of the distribution. The link function of the GLM is assumed to be mu^link.power for non-zero values of link.power or log(mu) for var.power=0. For example, `var.power=1` produces the identity link. The canonical link for each Tweedie family is `link.power = 1 - var.power`.

The Tweedie family of GLMs is discussed in detail by Dunn and Smyth (2018). Each value of `var.power` corresponds to a particular type of response distribution. The values 0, 1, 2 and 3 correspond to the normal distribution, the Poisson distribution, the gamma distribution and the inverse-Gaussian distribution respectively. For these choices of `var.power`, the Tweedie family is exactly equivalent to the usual GLM famly except with a greater choice of link powers. For example, `tweedie(var.power = 1, link.power = 0)` is exactly equivalent to `poisson(link = "log")`.

The most interesting Tweedie families occur for `var.power` between 1 and 2. For these GLMs, the response distribution has mass at zero (i.e., it has exact zeros) but is otherwise continuous on the positive real numbers (Smyth, 1996; Hasan et al, 2012). These GLMs have been used to model rainfall for example. Many days there is no rain at all (exact zero) but, if there is any rain, then the actual amount of rain is continuous and positive.

Generally speaking, `var.power` should be chosen so that the theoretical response distribution matches the type of response data being modeled. Hence `var.power` should be chosen between 1 and 2 only if the response observations are continuous and positive except for exact zeros and `var.power` should be chosen greater than or equal to 2 only if the response observations are continuous and strictly positive.

There are no theoretical Tweedie GLMs with var.power between 0 and 1 (Jorgensen 1987). The `tweedie` function will work for those values but the family should be interpreted in a quasi-likelihood sense.

Theoretical Tweedie GLMs do exist for negative values of var.power, but they are of little practical application. These distributions assume The `tweedie` function will work for those values but the family should be interpreted in a quasi-likelihood sense.

The name Tweedie has been associated with this family by Joergensen (1987) in honour of M. C. K. Tweedie. Joergensen (1987) gives a mathematical derivation of the Tweedie distributions proving that no distributions exist for var.power between 0 and 1.

Mathematically, a Tweedie GLM assumes the following. Let μ_i = E(y_i) be the expectation of the ith response. We assume that

μ_i^q = x_i^Tb, var(y_i) = φ μ_i^p

where x_i is a vector of covariates and b is a vector of regression cofficients, for some φ, p and q. This family is specified by `var.power = p` and `link.power = q`. A value of zero for q is interpreted as \log(μ_i) = x_i^Tb.

The following table summarizes the possible Tweedie response distributions:

 var.power Response distribution 0 Normal 1 Poisson (1, 2) Compound Poisson, non-negative with mass at zero 2 Gamma 3 Inverse-Gaussian > 2 Stable, with support on the positive reals

Value

A family object, which is a list of functions and expressions used by `glm` and `gam` in their iteratively reweighted least-squares algorithms. See `family` and `glm` in the R base help for details.

Gordon Smyth

References

Dunn, P. K., and Smyth, G. K, (2018). Generalized linear models with examples in R. Springer, New York, NY. doi: 10.1007/978-1-4419-0118-7 (Chapter 12 gives an overall discussion of Tweedie GLMs with R code and case studies.)

Hasan, M.M. and Dunn, P.K. (2012). Understanding the effect of climatology on monthly rainfall amounts in Australia using Tweedie GLMs. International Journal of Climatology, 32(7) 1006-1017. (An example with var.power between 1 and 2)

Joergensen, B. (1987). Exponential dispersion models. J. R. Statist. Soc. B 49, 127-162. (Mathematical derivation of Tweedie response distributions)

Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute. (The original mathematical paper from which the family is named)

Smyth, G. K. (1996). Regression modelling of quantity data with exact zeroes. Proceedings of the Second Australia-Japan Workshop on Stochastic Models in Engineering, Technology and Management. Technology Management Centre, University of Queensland, pp. 572-580. http://www.statsci.org/smyth/pubs/RegressionWithExactZerosPreprint.pdf (Derivation and examples of Tweedie GLMS with var.power between 0 and 1)

Smyth, G. K., and Verbyla, A. P., (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10, 695-709. http://www.statsci.org/smyth/pubs/Ties98-Preprint.pdf (Includes examples of Tweedie GLMs with `var.power=2` and `var.power=4`)

`glm`, `family`, `dtweedie`

Examples

```y <- rgamma(20,shape=5)
x <- 1:20
# Fit a poisson generalized linear model with identity link