knitr::opts_chunk$set(
  collapse = TRUE,
  prompt = TRUE,
  comment = ""
)
library(COMPoissonReg)
set.seed(1235)

\newpage

Introduction

\label{sec:intro}

The R package COMPoissonReg supports Conway-Maxwell Poisson (CMP) and Zero-Inflated Conway-Maxwell Poisson (ZICMP) models for analysis of count data in a flexible manner, to account for data dispersion relative to a Poisson model. The package provides regression functionality in addition to basic distribution functions. Interested users can refer to @SellersShmueli2010 and @SellersRaim2016 regarding the underlying theoretical developments for the CMP and ZICMP regressions, respectively. A full specification of the public COMPoissonReg interface can be found in the manual. In addition to package prerequisites Rcpp [@Eddelbuettel2013] and numDeriv [@numDeriv], ggplot2 [@Wickham2016] is also used in this vignette.

One of the challenges of working with CMP and ZICMP lies in computing the normalizing constant and related quantities. The normalizing constant does not have a simple closed form in general and can quickly increase or decrease magnitude as parameters are varied. COMPoissonReg takes a hybrid approach of either truncating the infinite series or making use of an approximation, depending on parameter values.

The remainder of the vignette proceeds as follows. Section \ref{sec:cmp} describes functions to support the CMP distribution, including numerical handling of the normalizing constant. Section \ref{sec:zicmp} describes functions for ZICMP. Finally, Section \ref{sec:reg} demonstrates regression functions; Sections \ref{sec:cmp-reg} and \ref{sec:zicmp-reg} give specific examples based on CMP and ZICMP outcomes, respectively. The COMPoissonReg package is on CRAN at https://cran.r-project.org/package=COMPoissonReg and the source code is on Github at https://github.com/lotze/COMPoissonReg.

Conway-Maxwell Poisson Distribution

\label{sec:cmp}

Let $Y \sim \text{CMP}(\lambda, \nu)$ be a Conway-Maxwell Poisson (CMP) random variable with density \begin{align} f(y \mid \lambda, \nu) = \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)}, \quad y \in \mathbb{N}, \quad Z(\lambda, \nu) = \sum_{r=0}^\infty \frac{\lambda^r}{(r!)^\nu}, \end{align} where $\lambda > 0$, $\nu > 0$, and $\mathbb{N}$ represents the nonnegative integers ${ 0, 1, 2, \ldots}$. Three notable special cases of $\text{CMP}(\lambda, \nu)$ help to demonstrate its flexibility in count modeling.

a. The case $\nu = 1$ corresponds to $\text{Poisson}(\lambda)$.

a. When $\lambda \in (0,1)$ and $\nu = 0$, the $\text{CMP}(\lambda, \nu)$ distribution is $\text{Geometric}(1-\lambda)$ with density $f(y \mid \lambda) = (1 - \lambda) \lambda^y$ for $y \in \mathbb{N}$, which is overdispersed relative to Poisson.

a. When $\nu \rightarrow \infty$, $\text{CMP}(\lambda, \nu)$ converges to a $\text{Bernoulli}(\lambda / (1 + \lambda))$ distribution which is underdispersed relative to Poisson.

Normalizing Constant

\label{sec:cmp-normconst} The normalizing constant $Z(\lambda, \nu)$ presents some challenges in the practical use of CMP models and has been a topic of interest in the CMP literature. In general, there is no simple closed form expression for the series $Z(\lambda, \nu)$. @ShmueliEtAl2005 give the approximation \begin{align} Z(\lambda, \nu) &= \frac{ \exp(\nu \lambda^{1/\nu}) }{ \lambda^{(\nu-1)/2\nu} (2\pi)^{(\nu-1)/2} \nu^{1/2} } \left{ 1 + O(\lambda^{-1/\nu}) \right}, \label{eqn:approx} \end{align} where the $O(\cdot)$ term vanishes as $\lambda^{-1/\nu}$ becomes small. Approximations have been further studied and refined in subsequent literature; see for example @GillispieGreen2015, @DalyGaunt2016, and @GauntEtAl2019. The expression in \eqref{eqn:approx} emphasizes that the magnitude of $Z(\lambda, \nu)$ explodes as $\nu \rightarrow 0$ when $\lambda > 1$. For example, $Z(2, 0.075) \approx e^{780.515}$ is too large to store as a double-precision floating point number, and may evaluate to infinity if care is not taken. In contrast, $Z(\lambda, \nu) \rightarrow 1/(1 - \lambda)$ when $\lambda < 1$ and $\nu \rightarrow 0$.

In practice, the COMPoissonReg package does not place constraints on $\lambda$ and $\nu$, except to ensure that they are positive, so that their values are driven by the data or the user's selection. A hybrid strategy motivated by \eqref{eqn:approx} is taken by COMPoissonReg.

To compute $Z(\lambda, \nu)$, suppose we are given a small tolerance $\delta > 0$. If \begin{align} \lambda^{-1/\nu} < \delta, \label{eqn:can_approx} \end{align} the first term of \eqref{eqn:approx} dominates the second term, and we take \begin{align} Z(\lambda, \nu) &\approx \frac{ \exp(\nu \lambda^{1/\nu}) }{ \lambda^{(\nu-1)/2\nu} (2\pi)^{(\nu-1)/2} \nu^{1/2} } \nonumber \ &=\exp\left{ \nu \lambda^{1/\nu} - \frac{\nu-1}{2\nu} \log \lambda - \frac{\nu-1}{2} \log(2\pi) - \frac{1}{2} \log \nu \right}. \label{eqn:z-approx} \end{align} as an approximation. Otherwise, the series is computed by truncating to a finite number of terms, which is described next. In either case, computations are kept on the log-scale as much as possible to accommodate numbers with potentially very large and very small magnitudes.

We approximate $Z(\lambda, \nu)$ by a finite summation $Z^{(M)}(\lambda, \nu) = \sum_{r=0}^M \lambda^r / (r!)^\nu$ if condition \eqref{eqn:can_approx} fails, so that the remainder is smaller than a given tolerance. The general approach is described in Appendix B of @ShmueliEtAl2005. @Robbins1955 gives bounds for Stirling's approximation as \begin{align} \sqrt{2\pi} n^{n + 1/2} e^{-n} e^{1 / (12n + 1)} < n! < \sqrt{2\pi} n^{n + 1/2} e^{-n} e^{1 / (12n)}. \end{align} Noting that $e^{1 / (12n + 1)} \geq 1$ for $n \geq 1$ and $\sqrt{2\pi} e^{1 / (12n)} \leq e$ for $n \geq 2$, we obtain simpler bounds \begin{align} \sqrt{2\pi} n^{n + 1/2} e^{-n} \leq n! \leq e n^{n + 1/2} e^{-n}, \end{align} which will be convenient in the following calculations.^[These bounds are also stated at https://en.wikipedia.org/wiki/Stirling\%27s_approximation, last accessed 2022-10-09.] We may then bound the truncation error for $Z^{(M)}(\lambda, \nu)$ using \begin{align} \lvert Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \rvert &= Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \nonumber \ &= \sum_{r=M+1}^\infty \frac{\lambda^r}{(r!)^\nu} \nonumber \ &\leq \sum_{r=M+1}^\infty \frac{\lambda^r}{(2\pi)^{\nu/2} r^{\nu r + \nu/2} e^{-r \nu}} \nonumber \ &\leq \sum_{r=M+1}^\infty \frac{\lambda^r}{(2\pi)^{\nu/2} (M+1)^{\nu r + \nu/2} e^{-r \nu}} \nonumber \ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \sum_{r=M+1}^\infty \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^r \label{eqn:geom} \ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \sum_{r=0}^\infty \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^{r+M+1} \nonumber \ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^{M+1} \frac{1}{1 - \frac{\lambda e^{\nu}}{(M+1)^{\nu}}} \nonumber \ &=: \Delta_M, \nonumber \end{align} assuming that $|\lambda e^\nu / (M+1)^\nu| < 1$ so that the geometric series in \eqref{eqn:geom} converges. To ensure this convergence we choose $M$ at least large enough so that \begin{align} \lambda e^\nu / (M+1)^\nu < 1 \iff M > \lambda^{1/\nu} e - 1. \end{align} For a small given number $\epsilon > 0$, we may consider bounding the relative error by \begin{align} &\frac{\lvert Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \rvert}{Z^{(M)}(\lambda, \nu)} \leq \frac{\Delta_M}{Z^{(M)}(\lambda, \nu)} < \epsilon. \label{eqn:rel_error_bound} \end{align} The second inequality of \eqref{eqn:rel_error_bound} can be expressed on the log-scale using \begin{align} \log \Delta_M - \log Z^{(M)}(\lambda, \nu) < \log \epsilon, \label{eqn:adaptive-log-scale} \end{align} where \begin{align} \log \Delta_M = -\frac{\nu}{2} \log(2\pi) - \nu \left( M+\frac{3}{2} \right) \log (M+1) + (M+1) (\nu + \log \lambda) - \log\left(1 - \frac{\lambda e^\nu}{(M+1)^\nu} \right). \end{align} Therefore, we compute $Z^{(M)}(\lambda, \nu)$ until at least $M > \lambda^{1/\nu} e - 1$, increasing $M$ and updating $Z^{(M)}(\lambda, \nu)$ until \eqref{eqn:adaptive-log-scale} is satisfied. This is summarized as Algorithm \ref{alg:normconst-trunc}.

\begin{algorithm} \caption{Compute the CMP normalizing constant using truncation approach.} \label{alg:normconst-trunc} \hspace{\algorithmicindent}\textbf{Input}: $\lambda > 0$ rate parameter. \ \hspace{\algorithmicindent}\textbf{Input}: $\nu > 0$ dispersion parameter. \ \hspace{\algorithmicindent}\textbf{Input}: $\epsilon > 0$ tolerance. \ \hspace{\algorithmicindent}\textbf{Input}: $y_\text{max} \in \mathbb{N}$ upper limit for $M$ \begin{algorithmic}[1] \Function{Truncated-$Z$}{$\lambda, \nu, \epsilon, y_\text{max}$} \State $M = 0$, $Z^{(0)} = 1$

\While{$M \leq \lambda^{1/\nu} e - 1$ and $M \leq y_\text{max}$} \State $Z^{(M+1)} \leftarrow Z^{(M)} + \lambda^{M} / (M!)^\nu$ \State $M \leftarrow M + 1$ \EndWhile

\While{$\log \Delta_M - \log Z^{(M)} \geq \log \epsilon$ and $M \leq y_\text{max}$} \State $Z^{(M+1)} \leftarrow Z^{(M)} + \lambda^{M} / (M!)^\nu$ \State $M \leftarrow M + 1$ \EndWhile

\State \Return ${ Z^{(M)}, M }$ \EndFunction \end{algorithmic} \end{algorithm}

The individual terms $\lambda^r / (r!)^\nu$ in the summation may be too large to store at their original scale. Therefore, summation is carried out at the log-scale, wherever possible, using the identity \begin{align} \log(x + y) = \log x + \log(1 + \exp{ \log y - \log x }); \label{eqn:logadd} \end{align} this is especially helpful when $0 < y \ll x$, as $\log x$ may be kept on the log-scale by the first term of the right-hand side of \eqref{eqn:logadd}, and the standard library function log1p may be used to accurately compute $\log(1 + \phi)$ for very small $\phi > 0$.

Many of the functions in the user interface of COMPoissonReg take an optional control argument, which can be constructed as follows.

control = get.control(
    ymax = 100000,
    hybrid.tol = 1e-2,
    truncate.tol = 1e-6
)

The tolerances $\delta$ and $\epsilon$ are specified as hybrid.tol and truncate.tol respectively. Taking hybrid.tol to be a very small positive number results in use of the truncated sum $Z^{(M)}(\lambda, \nu)$, while hybrid.tol = Inf uses the approximation method \eqref{eqn:z-approx}, except in extreme cases where $\lambda^{-1/\nu}$ evaluates to zero or $\infty$ numerically. The argument ymax specifies upper limit $M$; this is a safety measure which prevents very large computations unless the user opts to allow them. When no control object is specified, a global default (via option COMPoissonReg.control) is used.

control = getOption("COMPoissonReg.control")
control$ymax
control$hybrid.tol
control$truncate.tol

The default may be replaced in your current session if desired.

options(COMPoissonReg.control = control)

The control object contains several other useful arguments to be discussed later in the vignette.

The ncmp function computes the normalizing constant $Z(\lambda, \nu)$ and returns its value either on the original scale or the log-scale.

ncmp(lambda = 1.5, nu = 1.2)
ncmp(lambda = 1.5, nu = 1.2, log = TRUE)
ncmp(lambda = 1.5, nu = 1.2, log = TRUE, control = get.control(hybrid.tol = 1e10))
ncmp(lambda = 1.5, nu = 1.2, log = TRUE, control = get.control(hybrid.tol = 1e-10))

Before proceeding, let us define a function to display errors and warnings which are intentionally triggered in the remainder of the vignette.

print_warning = function(x) { print(strwrap(x), quote = FALSE) }

The function tcmp returns the truncation value $M$ obtained from Algorithm \ref{alg:normconst-trunc}.

nu_seq = c(1, 0.5, 0.2, 0.1, 0.05, 0.03)
tryCatch({ tcmp(lambda = 1.5, nu = nu_seq) }, warning = print_warning)

Note that tcmp returns 1e6 and produces a warning for the smallest nu value r min(nu_seq) because Algorithm \ref{alg:normconst-trunc} has reached ymax = 1e6 before the series could be bounded by a geometric series. Here, it is likely that support values with non-negligible mass are being left out. Let us increase ymax to avoid this problem.

tcmp(lambda = 1.5, nu = nu_seq, control = get.control(ymax = 3e6))

It is also possible to reach the second loop of Algorithm \ref{alg:normconst-trunc} where the geometric series can be used, but ymax is not large enough to satisfy \eqref{eqn:adaptive-log-scale}. Here is an example where this occurs.

tcmp(lambda = 1.2, nu = 0.03, control = get.control(ymax = 1200))

Now that we have a somewhat robust computation for the normalizing constant, let us create a plot of the interesting behavior when $\lambda > 1$ and $\nu$ is decreasing.

library(ggplot2)

nu_seq = seq(0.03, 1.5, length.out = 20)
nc1 = ncmp(lambda = 0.5, nu = nu_seq, log = TRUE)
nc2 = ncmp(lambda = 1.05, nu = nu_seq, log = TRUE)
nc3 = ncmp(lambda = 1.20, nu = nu_seq, log = TRUE)
ggplot() +
    geom_point(data = data.frame(x = nu_seq, y = nc1), aes(x = x, y = y), pch = 1) +
    geom_point(data = data.frame(x = nu_seq, y = nc2), aes(x = x, y = y), pch = 2) +
    geom_point(data = data.frame(x = nu_seq, y = nc3), aes(x = x, y = y), pch = 3) +
    xlab("nu") +
    ylab("log of normalizing constant") +
    theme_bw()

We see that with $\lambda = 1.2$ a value of $Z(\lambda, r round(nu_seq[1], 4)) \approx e^{r round(nc3[1], 2)}$ is obtained, which is an extremely large jump from the next value in the series $Z(\lambda, r round(nu_seq[2], 4)) \approx e^{r round(nc3[2], 2)}$.

Density, Generation, CDF, and Quantile Functions

\label{sec:cmp-dist}

The respective functions for CMP density, variate generation, CDF, and quantile functions are dcmp, rcmp, pcmp , and qcmp. Their usage is similar to distribution functions provided by the R stats package.

dcmp(0, lambda = 10, nu = 0.9)
dcmp(0:17, lambda = 10, nu = 0.9, log = TRUE)
dcmp(c(0, 1, 2), lambda = c(10, 11, 12), nu = c(0.9, 1.0, 1.1), log = TRUE)
rcmp(50, lambda = 10, nu = 0.9)
pcmp(0:17, lambda = 10, nu = 0.9)
qq = seq(0, 0.95, length.out = 10)
qcmp(qq, lambda = 10, nu = 0.9)

The CMP distribution functions compute the normalizing constant via Section \ref{sec:cmp-normconst}. The density dcmp uses the hybrid method, while rcmp, pcmp, and qcmp use the truncation method regardless of condition \eqref{eqn:can_approx}. Variate generation, CDF, and quantiles are then computed on CMP as if it were a discrete distribution on the sample space ${ 0, \ldots, M }$. As with tcmp and ncmp, a warning is emitted in cases where they may not produce reliable results.

tryCatch({ rcmp(1, lambda = 2, nu = 0.01) }, warning = print_warning)

The truncation method for qcmp can result in unreliable quantiles when the requested probability is very close to 1. For example, the actual quantile for probability $1$ is $\infty$, which can be expressed with no computation, but the computed quantity will be the truncation value $M$. More generally, \eqref{eqn:rel_error_bound} implies that \begin{align} \frac{Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu)}{Z(\lambda, \nu)} < \epsilon &\iff \Prob(Y > M) < \epsilon \nonumber \ &\iff 1 - F(M \mid \lambda, \nu) < \epsilon \nonumber \ &\iff F^{-}(1 - \epsilon \mid \lambda, \nu) < M. \label{eqn:cmp-quantile-limit} \end{align} Therefore, it is possible that quantiles larger than $1 - \epsilon$ may be inaccurately computed with the truncated support. COMPoissonReg gives a warning when these are requested.

tryCatch({
    qcmp(0.9999999, lambda = 1.5, nu = 0.5)
}, warning = print_warning)

As a sanity check, let us ensure that the empirical density values, cumulative probabilities, and quantiles of draws from rcmp align with respective results computed via dcmp, pcmp, and qcmp.

library(ggplot2)

n = 100000
lambda = 0.5
nu = 0.1
x = rcmp(n, lambda, nu)

xx = seq(-1, max(x))  ## Include -1 to ensure it gets probability zero
qq = seq(0, 0.99, length.out = 100)

fx = dcmp(xx, lambda, nu)
px = pcmp(xx, lambda, nu)
qx = qcmp(qq, lambda, nu)

qx_emp = quantile(x, probs = qq)
ggplot() +
    geom_bar(data = data.frame(x = x), aes(x = x, y = ..prop..), fill = "NA",
        col = "black") +
    geom_point(data = data.frame(x = xx[-1], fx = fx[-1]), aes(x, fx)) +
    ylab("Density") +
    theme_bw()
ggplot() +
    stat_ecdf(data = data.frame(x = x), aes(x), geom = "step") +
    geom_point(data = data.frame(x = xx, px = px), aes(x, px)) +
    ylab("Probability") +
    theme_bw()
ggplot() +
    geom_point(data = data.frame(x = qq, qx_emp = qx_emp), aes(qq, qx_emp), pch = 1) +
    geom_point(data = data.frame(x = qq, qx = qx), aes(qq, qx), pch = 3) +
    xlab("Probability") +
    ylab("Quantile") +
    theme_bw()

Expected Value and Variance

\label{sec:cmp-ev}

For $Y \sim \text{CMP}(\lambda, \nu)$, we can consider computing the expectation and variance of $Y$ in two ways. First, if there is a moderately-sized $M$ where ${ 0, \ldots, M }$ contains all but a negligible amount of the mass of $Y$, we can compute the moments using truncated summations \begin{align} \E(Y) = \sum_{y=0}^M y \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)}, \quad \E(Y^2) = \sum_{y=0}^M y^2 \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)}, \quad \Var(Y) = \E(Y^2) - [\E(Y)]^2. \end{align} Otherwise, a different approach is taken. Notice that the expected value is related to the first derivative of the log-normalizing constant via \begin{align} &\frac{\partial}{\partial \lambda} \log Z(\lambda, \nu) = \frac{ \frac{\partial}{\partial \lambda} Z(\lambda, \nu) }{ Z(\lambda, \nu) } = \frac{1}{Z(\lambda, \nu)} \sum_{y=0}^\infty y \frac{\lambda^{y-1}}{(y!)^\nu} \ &\iff \E(Y) = \lambda \frac{\partial}{\partial \lambda} \log Z(\lambda, \nu). \end{align} For the second derivative, \begin{align} &\frac{\partial^2}{\partial \lambda^2} \log Z(\lambda, \nu) = \frac{ Z(\lambda, \nu) \frac{\partial^2}{\partial \lambda^2} Z(\lambda, \nu) - [\frac{\partial}{\partial \lambda} Z(\lambda, \nu)]^2}{ Z(\lambda, \nu)^2 } = \frac{1}{Z(\lambda, \nu)} \sum_{y=0}^\infty y(y-1) \frac{\lambda^{y-2}}{(y!)^\nu} - \left[ \frac{\E(Y)}{\lambda} \right]^2 \ &\iff \lambda^2 \frac{\partial^2}{\partial \lambda^2} \log Z(\lambda, \nu) = \E[Y(Y-1)] - [\E(Y)]^2 = \Var(Y) - \E(Y) \ &\iff \Var(Y) = \lambda^2 \frac{\partial^2}{\partial \lambda^2} \log Z(\lambda, \nu) + \E(Y). \end{align} Therefore, we may use first and second derivatives of $\log Z(\lambda, \nu)$, taken with respect to $\lambda$, to compute $\E(Y)$ and $\Var(Y)$. The ecmp and vcmp functions implement this computation of the expectation and variance, respectively. The control object is used to determine whether to use the truncation or differentiation approach. Condition \eqref{eqn:can_approx} is used to determine whether to use the truncation approach (if false) or differentiation approach (if true).

ecmp(lambda = 10, nu = 1.2)
ecmp(lambda = 1.5, nu = 0.5)
ecmp(lambda = 1.5, nu = 0.05)
ecmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e-10))
ecmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e10))
vcmp(lambda = 10, nu = 1.2)
vcmp(lambda = 1.5, nu = 0.5)
vcmp(lambda = 1.5, nu = 0.05)
vcmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e-10))
vcmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e10))

Provided that an enormously large truncation value $M$ is not required, we may compute other moments by truncated sums using tcmp.

M = tcmp(lambda = 1.5, nu = 0.05)
print(M)
xx = seq(0, M)
sum(xx^3 * dcmp(xx, lambda, nu))    # E(X^3)
sum(xx^4 * dcmp(xx, lambda, nu))    # E(X^4)

Zero-Inflated Conway-Maxwell Poisson Distribution

\label{sec:zicmp}

Let $S \sim \text{Bernoulli}(p)$ and $T \sim \text{CMP}(\lambda, \nu)$ be independent random variables. Then $Y = (1-S) T$ follows a Zero-Inflated Conway-Maxwell Poisson distribution $\text{ZICMP}(\lambda, \nu, p)$ with density \begin{align} f(y \mid \lambda, \nu, p) = (1 - p) \frac{\lambda^{y}}{(y!)^{\nu} Z(\lambda, \nu)} + p \cdot \ind(y = 0), \quad y \in \mathbb{N}. \end{align} Like CMP, several interesting special cases are obtained.

a. Taking $\nu = 1$ corresponds to Zero-Inflated Poisson $\text{ZIP}(\lambda, p)$ with density $f(y \mid \lambda, p) = (1 - p) e^{-\lambda} \lambda^{y} / y! + p \cdot \ind(y = 0)$.

a. When $\lambda \in (0,1)$ and $\nu \rightarrow 0$, $\text{ZICMP}(\lambda, \nu)$ converges to a Zero-Inflated Geometric distribution with density $f(y \mid \lambda, p) = (1 - p) (1 - \lambda) \lambda^y + p \cdot \ind(y = 0)$ for $y \in \mathbb{N}$.

a. When $\nu \rightarrow \infty$, $\text{ZICMP}(\lambda, \nu, p)$ converges to a "Zero-Inflated Bernoulli" distribution with density \begin{math} f(y \mid \lambda, p) = (1 - p) \left[ \lambda/(1+\lambda) \right]^y \left[ 1/(1+\lambda) \right]^{1-y} + p \cdot \ind(y = 0). \end{math} which remains a Bernoulli distribution with an adjusted success probability. Here the $\lambda$ and $p$ parameters are not individually identifiable. Therefore, users may want to avoid zero-inflation in CMP data analyses with extreme underdispersion.

a. Finally, $p = 0$ yields $\text{CMP}(\lambda, \nu)$.

Density, Generation, CDF, and Quantile Functions

There is a close relationship between the CMP and ZICMP distributions, as we have seen from construction of ZICMP. The relationship between the CMP densities and variate generation mechanisms was given earlier in this section. Denote $F(x \mid \lambda, \nu)$ and $F(x \mid \lambda, \nu, p)$ as the CDFs of $\text{CMP}(\lambda, \nu)$ and $\text{ZICMP}(\lambda, \nu, p)$, respectively. We have \begin{align} F(x \mid \lambda, \nu, p) = (1-p) F(x \mid \lambda, \nu) + p \cdot \ind(x \geq 0). \end{align} For a given probability $\phi \in [0,1]$, the associated CMP and ZICMP quantile functions are related via \begin{align} F^{-}(\phi \mid \lambda, \nu, p) &= \inf{ x \in \mathbb{N} : F(x \mid \lambda, \nu, p) \geq \phi } \nonumber \ &= \inf{ x \in \mathbb{N} : (1-p) F(x \mid \lambda, \nu) + p \cdot \ind(x \geq 0) \geq \phi } \nonumber \ &= \inf{ x \in \mathbb{N} : F(x \mid \lambda, \nu) \geq (\phi - p) / (1 - p) } \nonumber \ &= F^{-}\left( \frac{\phi - p}{1 - p} \mid \lambda, \nu \right). \label{eqn:zicmp-quantiles} \end{align}

The respective functions for ZICMP density, variate generation, CDF, and quantile functions are dzicmp, rzicmp, pzicmp , and qzicmp. They make use of the CMP implementation described in Section \ref{sec:cmp} such as the criteria to either truncate or approximate the normalizing constant.

qq = seq(0, 0.95, length.out = 20)
rzicmp(20, lambda = 1.5, nu = 0.2, p = 0.25)
dzicmp(c(0, 1, 2), lambda = 1.5, nu = 0.2, p = 0.25)
pzicmp(c(0, 1, 2), lambda = 1.5, nu = 0.2, p = 0.25)
qzicmp(qq, lambda = 1.5, nu = 0.2, p = 0.25)

As with qcmp, the qzicmp function is computed by the truncation method, and cannot accurately compute quantiles very close to 1. More specifically, from the CMP distribution, \eqref{eqn:cmp-quantile-limit} gives \begin{align} M &> F^{-}(1 - \epsilon \mid \lambda, \nu) \ &= F^{-}\left( \frac{\phi_\epsilon - p}{1 - p} \mid \lambda, \nu \right) \ &= F^{-}\left( \phi_\epsilon \mid \lambda, \nu, p \right) \end{align} where $\phi_\epsilon = (1 - \epsilon)(1-p) + p$ and the last equality uses \eqref{eqn:zicmp-quantiles}. This motivates a warning from qzicmp when the argument is larger than $\phi_\epsilon$.

tryCatch({
    qzicmp(0.9999999, lambda = 1.5, nu = 0.5, p = 0.5)
}, warning = print_warning)

Let us repeat the sanity check from Section \ref{sec:cmp-dist} to ensure that the empirical density values, cumulative probabilities, and quantiles line up with the ones computed via dzicmp, pzicmp, and qzicmp.

library(ggplot2)

n = 100000
lambda = 0.5
nu = 0.1
p = 0.5
x = rzicmp(n, lambda, nu, p)

xx = seq(-1, max(x))  ## Include -1 to ensure it gets probability zero
qq = seq(0, 0.99, length.out = 100)

fx = dzicmp(xx, lambda, nu, p)
px = pzicmp(xx, lambda, nu, p)
qx = qzicmp(qq, lambda, nu, p)

qx_emp = quantile(x, probs = qq)
ggplot() +
    geom_bar(data = data.frame(x = x), aes(x = x, y = ..prop..), fill = "NA",
        col = "black") +
    geom_point(data = data.frame(x = xx[-1], fx = fx[-1]), aes(x, fx)) +
    ylab("Density") +
    theme_bw()
ggplot() +
    stat_ecdf(data = data.frame(x = x), aes(x), geom = "step") +
    geom_point(data = data.frame(x = xx, px = px), aes(x, px)) +
    ylab("Probability") +
    theme_bw()
ggplot() +
    geom_point(data = data.frame(x = qq, qx_emp = qx_emp), aes(qq, qx_emp), pch = 1) +
    geom_point(data = data.frame(x = qq, qx = qx), aes(qq, qx), pch = 3) +
    xlab("Probability") +
    ylab("Quantile") +
    theme_bw()

Expectation and Variance

The expected value and variance of $Y \sim \text{ZICMP}(\lambda, \nu, p)$ are apparent from the construction $Y = (1-S) T$ given earlier in this section. Namely, \begin{align} \E(Y) = (1-p) \E(T) \quad \text{and} \quad \Var(Y) = (1-p) \left{ \Var(T) + p[\E(T)]^2 \right} \end{align} may be obtained using formulas for iterated conditional expectations and variances. They are evaluated in COMPoissonReg using the ezicmp and vzicmp functions respectively. These functions make use of the ecmp and vcmp functions described in Section \ref{sec:cmp-ev} to compute $\E(T)$ and $\Var(T)$.

ezicmp(lambda = 1.5, nu = 0.5, p = 0.1)
ezicmp(lambda = 1.5, nu = 0.5, p = c(0.1, 0.2, 0.5))
vzicmp(lambda = 1.5, nu = 0.5, p = 0.1)
vzicmp(lambda = 1.5, nu = 0.5, p = c(0.1, 0.2, 0.5))

Regression Modeling with CMP and ZICMP

\label{sec:reg}

Suppose there are $n$ subjects with outcomes $y_1, \ldots, y_n \in \mathbb{N}$ and covariates $\vec{x}_i \in \mathbb{R}^{d_1}$, $\vec{s}_i \in \mathbb{R}^{d_2}$, and $\vec{w}_i \in \mathbb{R}^{d_3}$ for $i = 1, \ldots, n$. The COMPoissonReg package fits both CMP and ZICMP regression models.

The CMP regression model assumes that \begin{align} Y_i \indep \text{CMP}(\lambda_i, \nu_i), \quad i = 1, \ldots, n, \end{align} where $\log \lambda_i = \vec{x}i^\top \vec{\beta}$ and $\log \nu_i = \vec{s}_i^\top \vec{\gamma}$. Writing $\btheta = (\vec{\beta}, \vec{\gamma})$, the likelihood is \begin{align} L(\btheta) = \prod{i=1}^n \left[ \frac{\lambda_i^{y_i}}{(y_i!)^{\nu_i} Z(\lambda_i, \nu_i)} \right]. \label{eqn:likelihood-cmp} \end{align} The ZICMP regression model assumes that \begin{align} Y_i \indep \text{ZICMP}(\lambda_i, \nu_i, p_i), \quad i = 1, \ldots, n, \end{align} where $\log \lambda_i = \vec{x}i^\top \vec{\beta}$, $\log \nu_i = \vec{s}_i^\top \vec{\gamma}$, and $\logit p_i = \vec{w}_i^\top \vec{\zeta}$. Writing $\btheta = (\vec{\beta}, \vec{\gamma}, \vec{\zeta})$, the likelihood is \begin{align} L(\btheta) = \prod{i=1}^n \left[(1 - p_i) \frac{\lambda_i^{y_i}}{(y_i!)^{\nu_i} Z(\lambda_i, \nu_i)} + p_i \ind(y_i = 0) \right]. \label{eqn:likelihood-zicmp} \end{align} We will write $d = d_1 + d_2 + d_3$ for the total dimension of $\btheta$. The $n \times d_1$ design matrix whose rows consist of $\vec{x}_i$ will be denoted $\vec{X}$. Similarly, we will write $\vec{S}$ and $\vec{W}$ as the $n \times d_2$ and $n \times d_3$ design matrices constructed from $\vec{s}_i$ and $\vec{w}_i$, respectively. The glm.cmp function provides a formula interface to fit models of both types: \eqref{eqn:likelihood-cmp} and \eqref{eqn:likelihood-zicmp}.

out = glm.cmp(formula.lambda, formula.nu = ~ 1, formula.p = NULL,
    data = NULL, init = NULL, fixed = NULL, control = NULL, ...)

The interface contains three formulas: formula.lambda specifies the regression $\vec{x}_i^\top \vec{\beta}$ used for $\lambda_i$, while formula.nu and formula.p correspond to $\vec{s}_i^\top \vec{\gamma}$ for $\nu_i$ and $\vec{w}_i^\top \vec{\zeta}$ for $p_i$, respectively. ZICMP regression is utilized when formula.p is set to something other than its default NULL value; otherwise, CMP regression is assumed. The data argument is used to pass a data.frame explicitly rather than having the data be read from the local environment. The init, fixed, and control arguments and associated helper functions are described below.

The init argument represents an initial value for the optimizer. The following functions can be used to construct it.

get.init(beta = c(1, 2, 3), gamma = c(-1, 1), zeta = c(-2, -1))
get.init.zero(d1 = 3, d2 = 2, d3 = 2)

The fixed argument is used to specify indices of the three coefficients which will remain fixed at their initial value during optimization.

get.fixed(beta = c(1L, 2L), gamma = c(1L))

The specification above requests the first two elements of beta and the first element of gamma to be fixed. Notice that indices must be integers and that the default value is an empty integer vector which is interpreted as "no elements are fixed". The fixed argument can usually be disregarded but may be useful in some circumstances; an example is given in Section \ref{sec:zicmp-reg}.

Specifying the elements of init and fixed may be somewhat awkward with the formula interface, as they require knowledge of how formulas will be expanded into design matrices and coefficients. It can be helpful to produce the design matrices using R's model.matrix function.

model.matrix(formula.lambda, data = data)
model.matrix(formula.nu, data = data)
model.matrix(formula.p, data = data)

The control argument has been introduced in Section \ref{sec:cmp}; regression modeling makes use of several additional arguments. COMPoissonReg uses optim to compute the maximum likelihood estimate (MLE) $\hat{\btheta}$ for $\btheta$ under the specified model. Several controls are provided to influence how COMPoissonReg invokes optim; here are their default values.

control = getOption("COMPoissonReg.control")
control$optim.method
control$optim.control

The element optim.method is a string which is passed as the method argument to optim, while optim.control is a list passed as the control argument of optim. Note that, for the latter, if an entry is given for fnscale, it will be ignored and overwritten internally by COMPoissonReg.

The covariance of $\hat{\btheta}$ is estimated by $\hat{\vec{V}}(\hat{\btheta}) = -[\vec{H}(\hat{\btheta})]^{-1}$, where $\vec{H}(\btheta) = \partial^2 \log L(\btheta) / [\partial \btheta \partial \btheta^\top]$ is the Hessian of the log-likelihood computed by optim. The standard error for coefficient $\theta_j$ in $\btheta$ is then obtained as the square root of the $j$th diagonal of $\hat{\vec{V}}(\hat{\btheta})$.

We will now illustrate use of the regression tools using two examples whose data are included in the package. Note that these demonstrations are not intended to be complete regression analyses, and results may be slightly different than previously published analyses due to differences in the computations.

CMP Regression

\label{sec:cmp-reg}

Freight Dataset

\label{sec:cmp-reg-freight}

The freight dataset [@KutnerEtAl2003] was analyzed using CMP regression by @SellersShmueli2010 and found to exhibit underdispersion. The data describe $n = 10$ instances where 1,000 ampules were transported via air shipment. The outcome of interest is the variable broken which describes the number of broken ampules in each shipment. The covariate transfers describes the number of times the carton was transferred from one aircraft to another during the shipment.

Let us load and view the dataset.

data(freight)
print(freight)

Before fitting a CMP regression, let us fit a Poisson regression model $Y_i \indep \text{Poisson}(\lambda_i)$ with \begin{align} \log \lambda_i = \beta_0 + \beta_1 \cdot \text{transfers}_i. \end{align} This can be carried out with the standard glm function.

glm.out = glm(broken ~ transfers, data = freight, family = poisson)
summary(glm.out)

CMP Regression

\label{sec:cmp-reg-initial}

Next, let us fit a similar CMP regression model with \begin{align} &\log \lambda_i = \beta_0 + \beta_1 \cdot \text{transfers}_i, \ &\log \nu_i = \gamma_0, \end{align} using only an intercept for $\nu_i$.

cmp.out = glm.cmp(broken ~ transfers, data = freight)
print(cmp.out)

The coefficients used in the $\lambda_i$ formula are prefixed with an X: label, while an S: label is used for coefficients of the $\nu_i$ formula. Notice that estimates for X: coefficients from the CMP fit are dissimilar to those from the Poisson fit; this may occur when the estimate of $\nu$ deviates from the value of 1. Similarly to the glm output, the output of glm.cmp displays several quantities for each coefficient $\theta_j$, $j = 1, \ldots, d$: a point estimate $\hat{\theta}_j$, an associated standard error $\widehat{\text{SE}}(\hat{\theta}_j)$, a z-value $z_j = \hat{\theta}_j / \widehat{\text{SE}}(\hat{\theta}_j)$, and a p-value $2\Phi(-|z_j|)$ for the test $H_0: \theta_j = 0$ versus $H_1: \theta_j \neq 0$. Here, $\Phi$ is the CDF of the standard normal distribution. Because an intercept-only formula was specified for $\nu_i$, $\hat{\nu} = \exp(\hat{\gamma})$ does not vary with $i$ and its estimate and associated standard error are added to the display. Here we see evidence of underdispersion with $\hat{\nu} > 1$. A test for equidispersion is displayed to determine whether there is a significant amount of over or underdispersion in the data. In particular, a likelihood ratio test is used to decide whether $H_0: \vec{\gamma} = \vec{0}$ versus $H_0: \vec{\gamma} \neq \vec{0}$. The test statistic is displayed along with the degrees of freedom and associated p-value. Here we have fairly strong evidence to reject the null hypothesis of equidispersion.

The AIC for the CMP model cmp.out shows some improvement over that of glm.out. Let us also consider a slope coefficient for the $\nu$ component using \begin{align} \log \nu_i = \gamma_0 + \gamma_1 \cdot \text{transfers}_i. \end{align} via the following call to glm.cmp.

cmp2.out = glm.cmp(broken ~ transfers, formula.nu = ~ transfers, data = freight)
print(cmp2.out)

Model cmp2.out provides a slight improvement over cmp.out in AIC and BIC, but we may prefer cmp.out in the interest of simplicity.

Adjustments to Optim

\label{sec:cmp-reg-optim}

To gain some insight into the optimization, we may wish to increase the trace level, which can be done as follows.

control = get.control(optim.control = list(maxit = 5, trace = 3, REPORT = 1))
cmp3.out = glm.cmp(broken ~ transfers, data = freight, control = control)

Data from the local environment may be passed to the glm.cmp function without explicitly using the data argument.

y = freight$broken
x = freight$transfers
glm.cmp(y ~ x)

Offset Term

\label{sec:cmp-reg-offset}

In a count regression model, it may be desirable to include offset terms such as \begin{align} \log \lambda_i = \vec{x}_i^\top \vec{\beta} + \text{offx}_i, \quad \log \nu_i = \vec{s}_i^\top \vec{\beta} + \text{offs}_i. \end{align} An offset term may be used in the formula interface to accomplish this.

freight$offx = 13
freight$offs = 1
glm.cmp(broken ~ transfers + offset(offx), data = freight)
glm.cmp(broken ~ transfers + offset(offx), formula.nu = ~1 + offset(offs), data = freight)

For users who wish to bypass the formula interface and prepare the $\vec{X}$ and $\vec{S}$ design matrices manually, a "raw" interface to the regression functionality is also provided.

y = freight$broken
X = model.matrix(~ transfers, data = freight)
S = model.matrix(~ 1, data = freight)
offs = get.offset(x = rep(13, nrow(freight)), s = rep(1, nrow(freight)))
cmp.raw.out = glm.cmp.raw(y, X, S, offset = offs)

Accessor Functions

\label{sec:cmp-reg-accessors}

Several accessors are provided to extract results from the output object.

logLik(cmp.out)               ## Log-likelihood evaluated at MLE.
AIC(cmp.out)                  ## AIC evaluated at MLE.
BIC(cmp.out)                  ## BIC evaluated at MLE.
coef(cmp.out)                 ## Estimates of theta as a flat vector
coef(cmp.out, type = "list")  ## Estimates of theta as a named list
vcov(cmp.out)                 ## Estimated covariance matrix of theta hat
sdev(cmp.out)                 ## Standard deviations from vcov(...) diagonals
sdev(cmp.out, type = "list")  ## Standard deviations as a named list

The predict function computes several useful quantities evaluated at the estimate $\hat{\btheta}$. The argument default type = "response" produces a vector of estimates of the response $\hat{y}_i = \E(Y_i)$ for $i = 1, \ldots, n$ using the method described in Section \ref{sec:cmp-ev}. The argument type = "link" produces a data.frame with columns for the linked parameters $\lambda_i$ and $\nu_i$.

predict(cmp.out)
predict(cmp.out, type = "link")

Note that the estimated nu values are equal for all observations because the model assumed only an intercept term for the dispersion component.

We can also use predict on new covariate values. Note that models fit with the formula interface expect the new data to be provided as a data.frame which is interpreted using the formula used to fit the model. If the raw interface was used to fit the model, use the get.modelmatrix function to specify design matrices to use for prediction.

# Prepare new data to fit by formula interface
new.df = data.frame(transfers = 0:10)

# Prepare new data to fit by raw interface
X = model.matrix(~ transfers, data = new.df)
S = model.matrix(~ 1, data = new.df)
new.data = get.modelmatrix(X = X, S = S)

# Pass new data to model from by formula interface
y.hat.new = predict(cmp.out, newdata = new.df)

# Pass new data to model from by raw interface
y.hat.new = predict(cmp.raw.out, newdata = new.data)

# Compute predictions for links
predict.out = predict(cmp.out, newdata = new.df, type = "link")

# Plot predictions
ggplot() +
    geom_point(data = new.df, aes(transfers, y.hat.new)) +
    xlab("Number of transfers") +
    ylab("Predicted number broken") +
    theme_bw()
print(y.hat.new)
print(predict.out)

The leverage function computes the diagonal entries of a "hat" matrix which can be formulated in CMP regression. These can be used to diagnose influential observations. For details, see Section 3.6 of @SellersShmueli2010.

leverage(cmp.out)

The residuals function provides either raw (the default) or quantile-based residuals [@DunnSmyth1996]. In a CMP regression setting, raw residuals $y_i - \hat{y}_i$ generally do not work well with traditional regression diagnostics, such as Q-Q plots. Quantile-based residuals often produce interpretable diagnostics; however, a random element is used in the computation of quantile residuals for discrete distributions. This aids interpretability but gives slightly different residual values each time they are computed. See @DunnSmyth1996 for details.

res.raw = residuals(cmp.out)
res.qtl = residuals(cmp.out, type = "quantile")

Pearson residuals may be preferred over raw residuals for diagnostics; these can be obtained by standardizing raw residuals using leverage values and variance estimates.

link.hat = predict(cmp.out, type = "link")
vv = vcmp(link.hat$lambda, link.hat$nu)
hh = leverage(cmp.out)
res.pearson = res.raw / sqrt(vv*(1-hh))

For each type of residual---raw, Pearson, and quantile---we now plot fitted values versus residuals and Q-Q plots.

plot.fit.res = function(y.hat, res) {
    ggplot(data.frame(y = y.hat, res = res)) +
        geom_point(aes(y, res)) +
        xlab("Fitted Value") +
        ylab("Residual Value") +
        theme_bw() +
        theme(plot.title = element_text(size = 10))
}

plot.qq.res = function(res) {
    ggplot(data.frame(res = res), aes(sample = res)) +
        stat_qq() +
        stat_qq_line() +
        theme_bw() +
        theme(plot.title = element_text(size = 10))
}

y.hat = predict(cmp.out)

plot.fit.res(y.hat, res.raw) +
    ggtitle("Fitted Values vs. Raw Residuals")
plot.qq.res(res.raw) +
    ggtitle("Q-Q Plot of Raw Residuals")

plot.fit.res(y.hat, res.pearson) +
    ggtitle("Fitted Values vs. Pearson Residuals")
plot.qq.res(res.pearson) +
    ggtitle("Q-Q Plot of Pearson Residuals")

plot.fit.res(y.hat, res.qtl) +
    ggtitle("Fitted Values vs. Quantile Residuals")
plot.qq.res(res.qtl) +
    ggtitle("Q-Q Plot of Quantile Residuals")

In this example, with only r nrow(freight) observations, it is difficult to see an advantage of using quantile residuals; the benefit will be more apparent in Section \ref{sec:zicmp-reg}. One benefit of raw residuals is that they may be used to compute a mean-squared error.

mean(res.raw^2)

To access the results of the equidispersion test shown in the output of cmp.out, we may use the equitest accessor function.

equitest(cmp.out)

The deviance function computes the deviance quantities $D_i = -2 [\log L_i(\hat{\btheta}) - \log L_i(\tilde{\btheta}i)]$ for $i = 1, \ldots, n$, where $L_i(\btheta)$ is the term of the likelihood corresponding to the $i$th observation, $\hat{\btheta}$ is the MLE computed under the full likelihood $L(\btheta) = \prod{i=1}^n L_i(\btheta)$, and $\tilde{\btheta}_i$ is the maximizer of $L_i(\btheta)$.

deviance(cmp.out)

The parametric.bootstrap function carries out a parametric bootstrap with $R$ repetitions. Using the fitted MLE $\hat{\btheta}$, bootstrap samples $\vec{y}^{(r)} = (y_1^{(r)}, \ldots, y_n^{(r)})$ are drawn from the likelihood $L(\hat{\btheta})$ for $r = 1, \ldots, R$. Estimate $\hat{\btheta}^{(r)}$ is fitted from bootstrap sample $\vec{y}^{(r)}$. An $R \times d$ matrix is returned whose $r$th row is $\hat{\btheta}^{(r)}$.

cmp.boot = parametric.bootstrap(cmp.out, reps = 100)
head(cmp.boot)

We used $R = r nrow(cmp.boot)$ in the display above to keep vignette computations small, but a larger number may be desired in practice. Bootstrap repetitions can be used, for example, to compute 95% confidence intervals for each of the coefficients.

t(apply(cmp.boot, 2, quantile, c(0.025,0.975)))

Large Covariates

\label{sec:cmp-reg-large-covariates}

Large covariates can present numerical difficulties in fitting CMP regression. We will briefly demonstrate the difficulties and some possible workarounds. First let us generate a new dataset based on a large covariate in the regression for $\lambda_i$.

set.seed(1234)
n = 200
x = rnorm(n, 500, 10)
X = cbind(intercept = 1, slope = x)
S = matrix(1, n, 1)
beta_true = c(-0.05, 0.05)
gamma_true = 2
lambda_true = exp(X %*% beta_true)
nu_true = exp(S %*% gamma_true)
y = rcmp(n, lambda_true, nu_true)

Notice that the generated counts $y_1, \ldots, y_n$ are relatively small compared to the covariate $x_1, \ldots, x_n$.

summary(x)
summary(y)

An initial attempt to fit the true data-generating model fails.

tryCatch({
    glm.cmp(y ~ x, formula.nu = ~ 1)
}, error = print_warning)

Internally, the linked rate parameter $\lambda_i = \exp(\beta_0 + \beta_1 x_i)$ may evaluate to Inf or become very close to zero as the optimizer moves $\beta_1$ away from zero in a positive or negative direction, respectively. Some possible ways to address this are as follows.

Standardize the covariate to have mean zero and variance one.

glm.cmp(y ~ scale(x), formula.nu = ~ 1)

Use a logarithmic transformation on the covariate.

glm.cmp(y ~ log(x), formula.nu = ~ 1)

Change optimization method or other optim arguments.

control = get.control(optim.method = "BFGS", optim.control = list(maxit = 200))
suppressWarnings({
    cmp.out = glm.cmp(y ~ x, formula.nu = ~ 1, control = control)
    print(cmp.out)
})

In this case, standardization and logarithmic transformation produce a usable fit. Changing the optimization method to BFGS allows the optimization to finish, but there are further numerical problems in computing the Hessian for standard errors.

Large Outcomes

\label{sec:cmp-reg-large-outcomes}

Now consider a generated dataset with large outcomes but a relatively small covariate. This situation can also present numerical difficulties.

set.seed(1234)
n = 200
x = runif(n, 1, 2)
X = cbind(intercept = 1, slope = x)
S = matrix(1, n, 1)
beta_true = c(1, 1)
gamma_true = -0.95
lambda_true = exp(X %*% beta_true)
nu_true = exp(S %*% gamma_true)
y = rcmp(n, lambda_true, nu_true)
summary(x)
summary(y)

An initial attempt to fit the data-generating model fails.

tryCatch({
    glm.cmp(y ~ x, formula.nu = ~ 1)
}, error = print_warning)

Informative starting values help the optimizer to initially make progress. True data-generating parameters will not be available in a real data analysis situation but help to illustrate the idea.

init = get.init(beta = beta_true, gamma = gamma_true)
glm.cmp(y ~ x, formula.nu = ~ 1, init = init)

Without choosing an initial value, changing the optimization method to Nelder-Mead and increasing the maximum number of iterations also helps the optimizer find a solution.

control = get.control(optim.method = "Nelder-Mead", optim.control = list(maxit = 1000))
glm.cmp(y ~ x, formula.nu = ~ 1, control = control)

Note that this solution is different from the previous one; the log-likelihood of the previous one is slightly better.

ZICMP Regression

\label{sec:zicmp-reg}

Couple Dataset

\label{sec:zicmp-reg-couple}

The couple dataset [@LoeysEtAl2012] was analyzed with ZICMP regression in @SellersRaim2016 and found to exhibit overdispersion. The data concern separation trajectories of $n = 387$ couples. The variable UPB records the number of unwanted pursuit behavior perpetrations and is considered the outcome of interest. Included covariates are the binary variable EDUCATION, which is 1 if at least a bachelor's degree was attained, and a continuous variable ANXIETY which measures anxious attachment. A zero-inflated count model is considered for these data because 246 of the 387 records have an outcome of zero.

Let us load and view the first few records in the dataset.

data(couple)
head(couple)

As a preliminary model, let us fit a standard Poisson model $Y_i \indep \text{Poisson}(\lambda_i)$ with \begin{align} \log \lambda_i = \beta_0 + \beta_1 \cdot \text{EDUCATION}_i + \beta_2 \cdot \text{ANXIETY}_i. \end{align} We may use the standard glm function.

glm.out = glm(UPB ~ EDUCATION + ANXIETY, data = couple, family = poisson)
summary(glm.out)

ZICMP Regression

\label{sec:zicmp-reg-initial}

Now consider a ZICMP regression with \begin{align} &\log \lambda_i = \beta_0 + \beta_1 \cdot \text{EDUCATION}_i + \beta_2 \cdot \text{ANXIETY}_i, \ &\log \nu_i = \gamma_0, \ &\logit p_i = \zeta_0 + \zeta_1 \cdot \text{EDUCATION}_i + \zeta_2 \cdot \text{ANXIETY}_i. \end{align} We use the glm.cmp function as follows.

zicmp0.out = glm.cmp(UPB ~ EDUCATION + ANXIETY,
    formula.nu = ~ 1,
    formula.p = ~ EDUCATION + ANXIETY,
    data = couple)
print(zicmp0.out)

There are now three sets of coefficients reported in the output: the X:, S:, and W: prefixes label estimates for the $\lambda_i$, $\nu_i$, and $p_i$ formulas respectively.

Comments about Results

The AIC of the ZICMP model is drastically smaller than the Poisson model, indicating a greatly improved fit. However, there are signs of possible numerical issues. The estimate for $\gamma_0$ is a large negative number, but with an extremely large associated SE, which suggests that the effect may not be statistically significant. On the other hand, the estimate of $\nu$ is nearly zero with a small SE, which suggests that the dispersion parameter is indeed statistically significant. On the surface, this seems to be a contradiction.

The issue is that the Hessian of the log-likelihood becomes insensitive to small changes in $\gamma_0$ when $\lambda_i < 1$ and $\gamma_0$ is a large negative number. Let us first verify that the estimates for $\lambda_i$ are indeed smaller than 1.

pred.out = predict(zicmp0.out, type = "link")
summary(pred.out$lambda)

To show the insensitivity of the Hessian, let us consider a simpler setting with $Y \sim \text{CMP}(\lambda, \nu)$, $\lambda = \exp{-0.25} \approx 0.7788$ fixed, and $\nu = \exp{\gamma_0}$. We then have log-density \begin{align} \log f(y \mid \gamma_0) = y \log \lambda - e^{\gamma_0} \log(y!) - \log Z(\lambda, e^{\gamma_0}), \end{align} with first derivative and second derivatives, respectively, \begin{align} &\frac{\partial}{\partial \gamma_0} \log f(y \mid \gamma_0) = -e^{\gamma_0} \log(y!) - \frac{\partial}{\partial \gamma_0} \log Z(\lambda, e^{\gamma_0}), \ &\frac{\partial^2}{\partial \gamma_0^2} \log f(y \mid \gamma_0) = -e^{\gamma_0} \log(y!) - \frac{\partial^2}{\partial \gamma_0^2} \log Z(\lambda, e^{\gamma_0}). \end{align} For a given value of $y$, $-e^{\gamma_0} \log(y!)$ approaches zero as $\gamma_0$ decreases. Therefore, let us focus on the function ( g(\gamma_0) = -\log Z(\lambda, e^{\gamma_0}) ) and its first and second derivatives. The following code illustrates their behavior.

library(numDeriv)
g = function(gamma0) {
    -ncmp(lambda = exp(-0.25), nu = exp(gamma0), log = TRUE)
}
dat = data.frame(gamma0 = seq(0, -13), g = NA, d_g = NA, d2_g = NA)
for (j in 1:nrow(dat)) {
    gamma0 = dat$gamma0[j]
    dat$g[j] = g(gamma0)
    dat$d_g[j] = numDeriv::grad(func = g, x = gamma0)
    dat$d2_g[j] = numDeriv::hessian(func = g, x = gamma0)
}

Here is the result.

print(dat)

Notice that $g(\gamma_0)$ approaches a limit as $\gamma_0 \rightarrow -\infty$, which coincides with the CMP distribution approaching a Geometric distribution. It may not be surprising that the first and second derivatives approach zero accordingly. This explains the large SE for $\gamma_0$ in the model zicmp0.out. With estimates tending to this region of the parameter space, it may be preferable to fix fix $\gamma_0$ at a value such as $-\infty$, which will be done in the next section.

Fixed Coefficients

\label{sec:zicmp-reg-fixed}

Our attempt to fit the previous model strongly tended to the Zero-Inflated Geometric special case of ZICMP, but SEs computed via the Hessian become large in this region. In this section, we fix $\gamma_0$ at the extreme $-\infty$ and fit the remaining coefficients. Let us use the raw interface to do this.

init = coef(zicmp0.out, type = "list")
y = couple$UPB
X = model.matrix(~ EDUCATION + ANXIETY, data = couple)
S = model.matrix(~ 1, data = couple)
W = model.matrix(~ EDUCATION + ANXIETY, data = couple)
control = get.control(optim.method = "BFGS")
zicmp.out = glm.zicmp.raw(y, X, S, W,
    init = get.init(beta = c(-1,0,0), gamma = -Inf, zeta = c(-1,0,0)),
    fixed = get.fixed(gamma = 1L), control = control)
print(zicmp.out)

Notice that an additional Fixed column has been added to the display, indicating that the coefficient gamma is fixed. Furthermore, its Estimate column is set to the initial value and the columns SE, z-value, and p-value are set to NA. This model achieves a similar log-likelihood value as our first attempt using L-BFGS-B but does not exhibit signs of numerical issues.

Accessor Functions

\label{sec:zicmp-reg-accessors}

Here are several of the accessors provided to extract model outputs.

logLik(zicmp.out)               ## Log-likelihood evaluated at MLE
AIC(zicmp.out)                  ## AIC evaluated at MLE
BIC(zicmp.out)                  ## BIC evaluated at MLE
coef(zicmp.out)                 ## Estimates of theta as a flat vector
coef(zicmp.out, type = "list")  ## Estimates of theta as a named list
vcov(zicmp.out)                 ## Estimated covariance matrix of theta hat
sdev(zicmp.out)                 ## Standard deviations from vcov(...) diagonals
sdev(zicmp.out, type = "list")  ## Standard deviations as a named list
equitest(zicmp0.out)            ## Likelihood ratio test for H_0: gamma = 0
tryCatch({                      ## An error is thrown for model with fixed gamma
    equitest(zicmp.out)
}, error = print_warning)

Because we fixed $\gamma = -\infty$ to obtain zicmp0.out, the equitest function throws an error instead of proceeding with an equidispersion test.

The predict function behaves similarly as in CMP regression; however, the link type here also includes a column with the estimated $p_i$.

y.hat = predict(zicmp.out)  ## Fitted values based on ecmp
link.hat = predict(zicmp.out, type = "link")
head(y.hat)
head(link.hat)

In this example, we can see the benefit of using quantile residuals rather than raw residuals for diagnostic plots. The functions plot.fit.res and plot.qq.res have been defined in Section \ref{sec:cmp-reg}.

res.raw = residuals(zicmp.out, type = "raw")
res.qtl = residuals(zicmp.out, type = "quantile")

plot.fit.res(y.hat, res.raw) +
    ggtitle("Fitted Values vs. Raw Residuals")
plot.qq.res(res.raw) +
    ggtitle("Q-Q Plot of Raw Residuals")

plot.fit.res(y.hat, res.qtl) +
    ggtitle("Fitted Values vs. Quantile Residuals")
plot.qq.res(res.qtl) +
    ggtitle("Q-Q Plot of Quantile Residuals")

Here is an example of computing fitted values for new covariate data.

new.df = data.frame(EDUCATION = round(1:20 / 20), ANXIETY = seq(-3,3, length.out = 20))
X.new = model.matrix(~ EDUCATION + ANXIETY, data = new.df)
S.new = model.matrix(~ 1, data = new.df)
W.new = model.matrix(~ EDUCATION + ANXIETY, data = new.df)
new.data = get.modelmatrix(X.new, S.new, W.new)

# For model fit using raw interface, use get.modelmatrix to prepare new design
# matrices, offsets, etc
y.hat.new = predict(zicmp.out, newdata = new.data)

# For models fit with the formula interface, pass a data.frame with the same
# structure as used in the fit.
y.hat.new = predict(zicmp0.out, newdata = new.df)
print(y.hat.new)

As with CMP regression, a parametric.bootstrap function is provided for convenience to obtain a bootstrap sample $\hat{\btheta}^{(r)}$ of $\btheta$ based on the estimate $\hat{\btheta}$. Because it is too time consuming to run this example within the vignette, we show the code without output below. As in Section \ref{sec:cmp-reg}, we consider using the bootstrap samples to construct a 95% confidence interval for each of the coefficients.

zicmp.boot = parametric.bootstrap(zicmp.out, reps = 100)
head(zicmp.boot)
apply(zicmp.boot, 2, quantile, c(0.025,0.975))

Acknowledgements {.unlisted .unnumbered}

We acknowledge Thomas Lotze for significant contributions to the initial development of the COMPoissonReg package and for service as initial maintainer on CRAN. We thank Darcy Steeg Morris, Eric Slud, and Tommy Wright for reviewing the manuscript. We are grateful to the users of COMPoissonReg for their interest and for bringing to light some of the issues which have been considered in this work.

References



lotze/COMPoissonReg documentation built on Feb. 11, 2024, 12:03 p.m.