knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(tweedie)
Tweedie distributions are exponential dispersion models, with a mean $\mu$ and a variance $\phi \mu^\xi$, for some dispersion parameter $\phi > 0$ and a power index $\xi$ (sometimes called $p$) that uniquely defines the distribution within the Tweedie family (for all real values of $\xi$ not between 0 and 1).
Special cases of the Tweedie distributions are:
For all other values of $\xi$, the probability functions and distribution functions have no closed forms.
| Power index, $\xi$ | Distribution | Support | |-----------------------|------------------|-------------------------------| | $\xi = 0$ | Normal | $(-\infty, \infty)$ | | $0 < \xi < 1$ | Do not exist | | | $\xi = 1$, $\phi = 1$ | Poisson | Discrete: $(0, 1, 2, \dots)$ | | $1 < \xi < 2$ | Poisson--gamma | $[ 0, \infty)$ | | $\xi = 2$ | gamma | $(0, \infty)$ | | $\xi > 2$ | Skewed right | $(0, \infty$) | | $\xi = 3$ | inverse Gaussian | $(0, \infty)$ |
For $\xi < 1$, applications are limited (non-existent so far?), but have support on the entire real line and $\mu > 0$.
For $1 < \xi < 2$, Tweedie distributions can be represented as a Poisson sum of gamma distributions. These distributions are continuous for $Y > 0$ but have a discrete mass at $Y = 0$.
Likelihood computations are not need to fit a Tweedie generalized linear model, using the tweedie() family from the statmod package:
library(statmod) set.seed(96) N <- 25 # Mean of the Poisson (lambda) and Gamma (shape/scale) lambda <- 1.5 # Generating Compound Poisson-Gamma data manually y <- replicate(N, { n_events <- rpois(1, lambda = lambda) if (n_events == 0) 0 else sum(rgamma(n_events, shape = 2, scale = 1)) }) mod.tw <- glm(y ~ 1, family = statmod::tweedie(var.power = 1.5, link.power = 0) ) # link.power = 0 means the log-link
However, likelihood computations are necessary in other situations.
Random numbers from a Tweedie distribution are generated using rtweedie():
tweedie::rtweedie(10, xi = 1.1, mu = 2, phi = 1)
Accurate density functions are generated using dtweedie():
y <- seq(0, 2, length = 100) xi <- 1.1 mu <- 0.5 phi <- 0.4 twden <- tweedie::dtweedie(y, xi = xi, mu = mu, phi = phi) twdtn <- tweedie::ptweedie(y, xi = xi, mu = mu, phi = phi) plot( twden[y > 0] ~ y[y > 0], type ="l", lwd = 2, xlab = expression(italic(y)), ylab = "Density function") points(twden[y==0] ~ y[y == 0], lwd = 2, pch = 19, xlab = expression(italic(y)), ylab = "Distribution function")
Accurate probability functions are generated using ptweedie():
plot(twdtn ~ y, type = "l", lwd = 2, ylim = c(0, 1), xlab = expression(italic(y)), ylab = "Distribution function")
However, the function tweedie_plot() is sometimes more convenient (especially for $1 < \xi \ < 2$, when a probability mass at $Y = 0$ is present):
tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi, ylab = "Density function", xlab = expression(italic(y)), lwd = 2)
tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi, ylab = "Distribution function", xlab = expression(italic(y)), lwd = 2, ylim = c(0, 1), type = "cdf")
Quantile residuals can be computed to assess the fit of a glm:
library(tweedie) qqnorm( statmod::qresid(mod.tw) )
To use a Tweedie distribution in a glm, the value of $\xi$ is needed. To find the most suitable value of $\xi$, tweedie_profile() can be used:
out <- tweedie::tweedie.profile(y ~ 1, xi.vec = seq(1.2, 1.8, by = 0.05), do.plot = TRUE) # The estimated power index: out$xi.max
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.