knitr::opts_chunk$set( collapse = TRUE, prompt = TRUE, comment = "" )
library(COMPoissonReg) set.seed(1235)
\newpage
\label{sec:intro}
The R package COMPoissonReg
supports Conway-Maxwell Poisson (CMP) and Zero-Inflated Conway-Maxwell Poisson (ZICMP) models for analysis of count data in a flexible manner, to account for data dispersion relative to a Poisson model. The package provides regression functionality in addition to basic distribution functions. Interested users can refer to @SellersShmueli2010 and @SellersRaim2016 regarding the underlying theoretical developments for the CMP and ZICMP regressions, respectively. A full specification of the public COMPoissonReg
interface can be found in the manual. In addition to package prerequisites Rcpp
[@Eddelbuettel2013] and numDeriv
[@numDeriv], ggplot2
[@Wickham2016] is also used in this vignette.
One of the challenges of working with CMP and ZICMP lies in computing the normalizing constant and related quantities. The normalizing constant does not have a simple closed form in general and can quickly increase or decrease magnitude as parameters are varied. COMPoissonReg
takes a hybrid approach of either truncating the infinite series or making use of an approximation, depending on parameter values.
The remainder of the vignette proceeds as follows. Section \ref{sec:cmp} describes functions to support the CMP distribution, including numerical handling of the normalizing constant. Section \ref{sec:zicmp} describes functions for ZICMP. Finally, Section \ref{sec:reg} demonstrates regression functions; Sections \ref{sec:cmp-reg} and \ref{sec:zicmp-reg} give specific examples based on CMP and ZICMP outcomes, respectively. The COMPoissonReg
package is on CRAN at https://cran.r-project.org/package=COMPoissonReg and the source code is on Github at https://github.com/lotze/COMPoissonReg.
\label{sec:cmp}
Let $Y \sim \text{CMP}(\lambda, \nu)$ be a Conway-Maxwell Poisson (CMP) random variable with density \begin{align} f(y \mid \lambda, \nu) = \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)}, \quad y \in \mathbb{N}, \quad Z(\lambda, \nu) = \sum_{r=0}^\infty \frac{\lambda^r}{(r!)^\nu}, \end{align} where $\lambda > 0$, $\nu > 0$, and $\mathbb{N}$ represents the nonnegative integers ${ 0, 1, 2, \ldots}$. Three notable special cases of $\text{CMP}(\lambda, \nu)$ help to demonstrate its flexibility in count modeling.
a. The case $\nu = 1$ corresponds to $\text{Poisson}(\lambda)$.
a. When $\lambda \in (0,1)$ and $\nu = 0$, the $\text{CMP}(\lambda, \nu)$ distribution is $\text{Geometric}(1-\lambda)$ with density $f(y \mid \lambda) = (1 - \lambda) \lambda^y$ for $y \in \mathbb{N}$, which is overdispersed relative to Poisson.
a. When $\nu \rightarrow \infty$, $\text{CMP}(\lambda, \nu)$ converges to a $\text{Bernoulli}(\lambda / (1 + \lambda))$ distribution which is underdispersed relative to Poisson.
\label{sec:cmp-normconst} The normalizing constant $Z(\lambda, \nu)$ presents some challenges in the practical use of CMP models and has been a topic of interest in the CMP literature. In general, there is no simple closed form expression for the series $Z(\lambda, \nu)$. @ShmueliEtAl2005 give the approximation \begin{align} Z(\lambda, \nu) &= \frac{ \exp(\nu \lambda^{1/\nu}) }{ \lambda^{(\nu-1)/2\nu} (2\pi)^{(\nu-1)/2} \nu^{1/2} } \left{ 1 + O(\lambda^{-1/\nu}) \right}, \label{eqn:approx} \end{align} where the $O(\cdot)$ term vanishes as $\lambda^{-1/\nu}$ becomes small. Approximations have been further studied and refined in subsequent literature; see for example @GillispieGreen2015, @DalyGaunt2016, and @GauntEtAl2019. The expression in \eqref{eqn:approx} emphasizes that the magnitude of $Z(\lambda, \nu)$ explodes as $\nu \rightarrow 0$ when $\lambda > 1$. For example, $Z(2, 0.075) \approx e^{780.515}$ is too large to store as a double-precision floating point number, and may evaluate to infinity if care is not taken. In contrast, $Z(\lambda, \nu) \rightarrow 1/(1 - \lambda)$ when $\lambda < 1$ and $\nu \rightarrow 0$.
In practice, the COMPoissonReg
package does not place constraints on $\lambda$ and $\nu$, except to ensure that they are positive, so that their values are driven by the data or the user's selection. A hybrid strategy motivated by \eqref{eqn:approx} is taken by COMPoissonReg
.
To compute $Z(\lambda, \nu)$, suppose we are given a small tolerance $\delta > 0$. If \begin{align} \lambda^{-1/\nu} < \delta, \label{eqn:can_approx} \end{align} the first term of \eqref{eqn:approx} dominates the second term, and we take \begin{align} Z(\lambda, \nu) &\approx \frac{ \exp(\nu \lambda^{1/\nu}) }{ \lambda^{(\nu-1)/2\nu} (2\pi)^{(\nu-1)/2} \nu^{1/2} } \nonumber \ &=\exp\left{ \nu \lambda^{1/\nu} - \frac{\nu-1}{2\nu} \log \lambda - \frac{\nu-1}{2} \log(2\pi) - \frac{1}{2} \log \nu \right}. \label{eqn:z-approx} \end{align} as an approximation. Otherwise, the series is computed by truncating to a finite number of terms, which is described next. In either case, computations are kept on the log-scale as much as possible to accommodate numbers with potentially very large and very small magnitudes.
We approximate $Z(\lambda, \nu)$ by a finite summation $Z^{(M)}(\lambda, \nu) = \sum_{r=0}^M \lambda^r / (r!)^\nu$ if condition \eqref{eqn:can_approx} fails, so that the remainder is smaller than a given tolerance. The general approach is described in Appendix B of @ShmueliEtAl2005. @Robbins1955 gives bounds for Stirling's approximation as \begin{align} \sqrt{2\pi} n^{n + 1/2} e^{-n} e^{1 / (12n + 1)} < n! < \sqrt{2\pi} n^{n + 1/2} e^{-n} e^{1 / (12n)}. \end{align} Noting that $e^{1 / (12n + 1)} \geq 1$ for $n \geq 1$ and $\sqrt{2\pi} e^{1 / (12n)} \leq e$ for $n \geq 2$, we obtain simpler bounds \begin{align} \sqrt{2\pi} n^{n + 1/2} e^{-n} \leq n! \leq e n^{n + 1/2} e^{-n}, \end{align} which will be convenient in the following calculations.^[These bounds are also stated at https://en.wikipedia.org/wiki/Stirling\%27s_approximation, last accessed 2022-10-09.] We may then bound the truncation error for $Z^{(M)}(\lambda, \nu)$ using \begin{align} \lvert Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \rvert &= Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \nonumber \ &= \sum_{r=M+1}^\infty \frac{\lambda^r}{(r!)^\nu} \nonumber \ &\leq \sum_{r=M+1}^\infty \frac{\lambda^r}{(2\pi)^{\nu/2} r^{\nu r + \nu/2} e^{-r \nu}} \nonumber \ &\leq \sum_{r=M+1}^\infty \frac{\lambda^r}{(2\pi)^{\nu/2} (M+1)^{\nu r + \nu/2} e^{-r \nu}} \nonumber \ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \sum_{r=M+1}^\infty \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^r \label{eqn:geom} \ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \sum_{r=0}^\infty \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^{r+M+1} \nonumber \ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^{M+1} \frac{1}{1 - \frac{\lambda e^{\nu}}{(M+1)^{\nu}}} \nonumber \ &=: \Delta_M, \nonumber \end{align} assuming that $|\lambda e^\nu / (M+1)^\nu| < 1$ so that the geometric series in \eqref{eqn:geom} converges. To ensure this convergence we choose $M$ at least large enough so that \begin{align} \lambda e^\nu / (M+1)^\nu < 1 \iff M > \lambda^{1/\nu} e - 1. \end{align} For a small given number $\epsilon > 0$, we may consider bounding the relative error by \begin{align} &\frac{\lvert Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \rvert}{Z^{(M)}(\lambda, \nu)} \leq \frac{\Delta_M}{Z^{(M)}(\lambda, \nu)} < \epsilon. \label{eqn:rel_error_bound} \end{align} The second inequality of \eqref{eqn:rel_error_bound} can be expressed on the log-scale using \begin{align} \log \Delta_M - \log Z^{(M)}(\lambda, \nu) < \log \epsilon, \label{eqn:adaptive-log-scale} \end{align} where \begin{align} \log \Delta_M = -\frac{\nu}{2} \log(2\pi) - \nu \left( M+\frac{3}{2} \right) \log (M+1) + (M+1) (\nu + \log \lambda) - \log\left(1 - \frac{\lambda e^\nu}{(M+1)^\nu} \right). \end{align} Therefore, we compute $Z^{(M)}(\lambda, \nu)$ until at least $M > \lambda^{1/\nu} e - 1$, increasing $M$ and updating $Z^{(M)}(\lambda, \nu)$ until \eqref{eqn:adaptive-log-scale} is satisfied. This is summarized as Algorithm \ref{alg:normconst-trunc}.
\begin{algorithm} \caption{Compute the CMP normalizing constant using truncation approach.} \label{alg:normconst-trunc} \hspace{\algorithmicindent}\textbf{Input}: $\lambda > 0$ rate parameter. \ \hspace{\algorithmicindent}\textbf{Input}: $\nu > 0$ dispersion parameter. \ \hspace{\algorithmicindent}\textbf{Input}: $\epsilon > 0$ tolerance. \ \hspace{\algorithmicindent}\textbf{Input}: $y_\text{max} \in \mathbb{N}$ upper limit for $M$ \begin{algorithmic}[1] \Function{Truncated-$Z$}{$\lambda, \nu, \epsilon, y_\text{max}$} \State $M = 0$, $Z^{(0)} = 1$
\While{$M \leq \lambda^{1/\nu} e - 1$ and $M \leq y_\text{max}$} \State $Z^{(M+1)} \leftarrow Z^{(M)} + \lambda^{M} / (M!)^\nu$ \State $M \leftarrow M + 1$ \EndWhile
\While{$\log \Delta_M - \log Z^{(M)} \geq \log \epsilon$ and $M \leq y_\text{max}$} \State $Z^{(M+1)} \leftarrow Z^{(M)} + \lambda^{M} / (M!)^\nu$ \State $M \leftarrow M + 1$ \EndWhile
\State \Return ${ Z^{(M)}, M }$ \EndFunction \end{algorithmic} \end{algorithm}
The individual terms $\lambda^r / (r!)^\nu$ in the summation may be too large to store at their original scale. Therefore, summation is carried out at the log-scale, wherever possible, using the identity
\begin{align}
\log(x + y) = \log x + \log(1 + \exp{ \log y - \log x });
\label{eqn:logadd}
\end{align}
this is especially helpful when $0 < y \ll x$, as $\log x$ may be kept on the log-scale by the first term of the right-hand side of \eqref{eqn:logadd}, and the standard library function log1p
may be used to accurately compute $\log(1 + \phi)$ for very small $\phi > 0$.
Many of the functions in the user interface of COMPoissonReg
take an optional control
argument, which can be constructed as follows.
control = get.control( ymax = 100000, hybrid.tol = 1e-2, truncate.tol = 1e-6 )
The tolerances $\delta$ and $\epsilon$ are specified as hybrid.tol
and truncate.tol
respectively. Taking hybrid.tol
to be a very small positive number results in use of the truncated sum $Z^{(M)}(\lambda, \nu)$, while hybrid.tol = Inf
uses the approximation method \eqref{eqn:z-approx}, except in extreme cases where $\lambda^{-1/\nu}$ evaluates to zero or $\infty$ numerically. The argument ymax
specifies upper limit $M$; this is a safety measure which prevents very large computations unless the user opts to allow them. When no control object is specified, a global default (via option COMPoissonReg.control
) is used.
control = getOption("COMPoissonReg.control") control$ymax control$hybrid.tol control$truncate.tol
The default may be replaced in your current session if desired.
options(COMPoissonReg.control = control)
The control object contains several other useful arguments to be discussed later in the vignette.
The ncmp
function computes the normalizing constant $Z(\lambda, \nu)$ and returns its value either on the original scale or the log-scale.
ncmp(lambda = 1.5, nu = 1.2) ncmp(lambda = 1.5, nu = 1.2, log = TRUE) ncmp(lambda = 1.5, nu = 1.2, log = TRUE, control = get.control(hybrid.tol = 1e10)) ncmp(lambda = 1.5, nu = 1.2, log = TRUE, control = get.control(hybrid.tol = 1e-10))
Before proceeding, let us define a function to display errors and warnings which are intentionally triggered in the remainder of the vignette.
print_warning = function(x) { print(strwrap(x), quote = FALSE) }
The function tcmp
returns the truncation value $M$ obtained from Algorithm \ref{alg:normconst-trunc}.
nu_seq = c(1, 0.5, 0.2, 0.1, 0.05, 0.03) tryCatch({ tcmp(lambda = 1.5, nu = nu_seq) }, warning = print_warning)
Note that tcmp
returns 1e6
and produces a warning for the smallest nu
value r min(nu_seq)
because Algorithm \ref{alg:normconst-trunc} has reached ymax = 1e6
before the series could be bounded by a geometric series. Here, it is likely that support values with non-negligible mass are being left out. Let us increase ymax
to avoid this problem.
tcmp(lambda = 1.5, nu = nu_seq, control = get.control(ymax = 3e6))
It is also possible to reach the second loop of Algorithm \ref{alg:normconst-trunc} where the geometric series can be used, but ymax
is not large enough to satisfy \eqref{eqn:adaptive-log-scale}. Here is an example where this occurs.
tcmp(lambda = 1.2, nu = 0.03, control = get.control(ymax = 1200))
Now that we have a somewhat robust computation for the normalizing constant, let us create a plot of the interesting behavior when $\lambda > 1$ and $\nu$ is decreasing.
library(ggplot2) nu_seq = seq(0.03, 1.5, length.out = 20) nc1 = ncmp(lambda = 0.5, nu = nu_seq, log = TRUE) nc2 = ncmp(lambda = 1.05, nu = nu_seq, log = TRUE) nc3 = ncmp(lambda = 1.20, nu = nu_seq, log = TRUE)
ggplot() + geom_point(data = data.frame(x = nu_seq, y = nc1), aes(x = x, y = y), pch = 1) + geom_point(data = data.frame(x = nu_seq, y = nc2), aes(x = x, y = y), pch = 2) + geom_point(data = data.frame(x = nu_seq, y = nc3), aes(x = x, y = y), pch = 3) + xlab("nu") + ylab("log of normalizing constant") + theme_bw()
We see that with $\lambda = 1.2$ a value of $Z(\lambda, r round(nu_seq[1], 4)
) \approx e^{r round(nc3[1], 2)
}$ is obtained, which is an extremely large jump from the next value in the series $Z(\lambda, r round(nu_seq[2], 4)
) \approx e^{r round(nc3[2], 2)
}$.
\label{sec:cmp-dist}
The respective functions for CMP density, variate generation, CDF, and quantile functions are dcmp
, rcmp
, pcmp
, and qcmp
. Their usage is similar to distribution functions provided by the R stats
package.
dcmp(0, lambda = 10, nu = 0.9) dcmp(0:17, lambda = 10, nu = 0.9, log = TRUE) dcmp(c(0, 1, 2), lambda = c(10, 11, 12), nu = c(0.9, 1.0, 1.1), log = TRUE)
rcmp(50, lambda = 10, nu = 0.9)
pcmp(0:17, lambda = 10, nu = 0.9)
qq = seq(0, 0.95, length.out = 10) qcmp(qq, lambda = 10, nu = 0.9)
The CMP distribution functions compute the normalizing constant via Section \ref{sec:cmp-normconst}. The density dcmp
uses the hybrid method, while rcmp
, pcmp
, and qcmp
use the truncation method regardless of condition \eqref{eqn:can_approx}. Variate generation, CDF, and quantiles are then computed on CMP as if it were a discrete distribution on the sample space ${ 0, \ldots, M }$. As with tcmp
and ncmp
, a warning is emitted in cases where they may not produce reliable results.
tryCatch({ rcmp(1, lambda = 2, nu = 0.01) }, warning = print_warning)
The truncation method for qcmp
can result in unreliable quantiles when the requested probability is very close to 1. For example, the actual quantile for probability $1$ is $\infty$, which can be expressed with no computation, but the computed quantity will be the truncation value $M$. More generally, \eqref{eqn:rel_error_bound} implies that
\begin{align}
\frac{Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu)}{Z(\lambda, \nu)} < \epsilon
&\iff \Prob(Y > M) < \epsilon \nonumber \
&\iff 1 - F(M \mid \lambda, \nu) < \epsilon \nonumber \
&\iff F^{-}(1 - \epsilon \mid \lambda, \nu) < M.
\label{eqn:cmp-quantile-limit}
\end{align}
Therefore, it is possible that quantiles larger than $1 - \epsilon$ may be inaccurately computed with the truncated support. COMPoissonReg
gives a warning when these are requested.
tryCatch({ qcmp(0.9999999, lambda = 1.5, nu = 0.5) }, warning = print_warning)
As a sanity check, let us ensure that the empirical density values, cumulative probabilities, and quantiles of draws from rcmp
align with respective results computed via dcmp
, pcmp
, and qcmp
.
library(ggplot2) n = 100000 lambda = 0.5 nu = 0.1 x = rcmp(n, lambda, nu) xx = seq(-1, max(x)) ## Include -1 to ensure it gets probability zero qq = seq(0, 0.99, length.out = 100) fx = dcmp(xx, lambda, nu) px = pcmp(xx, lambda, nu) qx = qcmp(qq, lambda, nu) qx_emp = quantile(x, probs = qq)
ggplot() + geom_bar(data = data.frame(x = x), aes(x = x, y = ..prop..), fill = "NA", col = "black") + geom_point(data = data.frame(x = xx[-1], fx = fx[-1]), aes(x, fx)) + ylab("Density") + theme_bw()
ggplot() + stat_ecdf(data = data.frame(x = x), aes(x), geom = "step") + geom_point(data = data.frame(x = xx, px = px), aes(x, px)) + ylab("Probability") + theme_bw()
ggplot() + geom_point(data = data.frame(x = qq, qx_emp = qx_emp), aes(qq, qx_emp), pch = 1) + geom_point(data = data.frame(x = qq, qx = qx), aes(qq, qx), pch = 3) + xlab("Probability") + ylab("Quantile") + theme_bw()
\label{sec:cmp-ev}
For $Y \sim \text{CMP}(\lambda, \nu)$, we can consider computing the expectation and variance of $Y$ in two ways. First, if there is a moderately-sized $M$ where ${ 0, \ldots, M }$ contains all but a negligible amount of the mass of $Y$, we can compute the moments using truncated summations
\begin{align}
\E(Y) = \sum_{y=0}^M y \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)}, \quad
\E(Y^2) = \sum_{y=0}^M y^2 \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)}, \quad
\Var(Y) = \E(Y^2) - [\E(Y)]^2.
\end{align}
Otherwise, a different approach is taken. Notice that the expected value is related to the first derivative of the log-normalizing constant via
\begin{align}
&\frac{\partial}{\partial \lambda} \log Z(\lambda, \nu)
= \frac{ \frac{\partial}{\partial \lambda} Z(\lambda, \nu) }{ Z(\lambda, \nu) }
= \frac{1}{Z(\lambda, \nu)} \sum_{y=0}^\infty y \frac{\lambda^{y-1}}{(y!)^\nu} \
&\iff \E(Y) = \lambda \frac{\partial}{\partial \lambda} \log Z(\lambda, \nu).
\end{align}
For the second derivative,
\begin{align}
&\frac{\partial^2}{\partial \lambda^2} \log Z(\lambda, \nu)
= \frac{ Z(\lambda, \nu) \frac{\partial^2}{\partial \lambda^2} Z(\lambda, \nu) - [\frac{\partial}{\partial \lambda} Z(\lambda, \nu)]^2}{ Z(\lambda, \nu)^2 }
= \frac{1}{Z(\lambda, \nu)} \sum_{y=0}^\infty y(y-1) \frac{\lambda^{y-2}}{(y!)^\nu} - \left[ \frac{\E(Y)}{\lambda} \right]^2 \
&\iff \lambda^2 \frac{\partial^2}{\partial \lambda^2} \log Z(\lambda, \nu) = \E[Y(Y-1)] - [\E(Y)]^2 = \Var(Y) - \E(Y) \
&\iff \Var(Y) = \lambda^2 \frac{\partial^2}{\partial \lambda^2} \log Z(\lambda, \nu) + \E(Y).
\end{align}
Therefore, we may use first and second derivatives of $\log Z(\lambda, \nu)$, taken with respect to $\lambda$, to compute $\E(Y)$ and $\Var(Y)$. The ecmp
and vcmp
functions implement this computation of the expectation and variance, respectively. The control object is used to determine whether to use the truncation or differentiation approach. Condition \eqref{eqn:can_approx} is used to determine whether to use the truncation approach (if false) or differentiation approach (if true).
ecmp(lambda = 10, nu = 1.2) ecmp(lambda = 1.5, nu = 0.5) ecmp(lambda = 1.5, nu = 0.05) ecmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e-10)) ecmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e10))
vcmp(lambda = 10, nu = 1.2) vcmp(lambda = 1.5, nu = 0.5) vcmp(lambda = 1.5, nu = 0.05) vcmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e-10)) vcmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e10))
Provided that an enormously large truncation value $M$ is not required, we may compute other moments by truncated sums using tcmp
.
M = tcmp(lambda = 1.5, nu = 0.05) print(M) xx = seq(0, M) sum(xx^3 * dcmp(xx, lambda, nu)) # E(X^3) sum(xx^4 * dcmp(xx, lambda, nu)) # E(X^4)
\label{sec:zicmp}
Let $S \sim \text{Bernoulli}(p)$ and $T \sim \text{CMP}(\lambda, \nu)$ be independent random variables. Then $Y = (1-S) T$ follows a Zero-Inflated Conway-Maxwell Poisson distribution $\text{ZICMP}(\lambda, \nu, p)$ with density \begin{align} f(y \mid \lambda, \nu, p) = (1 - p) \frac{\lambda^{y}}{(y!)^{\nu} Z(\lambda, \nu)} + p \cdot \ind(y = 0), \quad y \in \mathbb{N}. \end{align} Like CMP, several interesting special cases are obtained.
a. Taking $\nu = 1$ corresponds to Zero-Inflated Poisson $\text{ZIP}(\lambda, p)$ with density $f(y \mid \lambda, p) = (1 - p) e^{-\lambda} \lambda^{y} / y! + p \cdot \ind(y = 0)$.
a. When $\lambda \in (0,1)$ and $\nu \rightarrow 0$, $\text{ZICMP}(\lambda, \nu)$ converges to a Zero-Inflated Geometric distribution with density $f(y \mid \lambda, p) = (1 - p) (1 - \lambda) \lambda^y + p \cdot \ind(y = 0)$ for $y \in \mathbb{N}$.
a. When $\nu \rightarrow \infty$, $\text{ZICMP}(\lambda, \nu, p)$ converges to a "Zero-Inflated Bernoulli" distribution with density \begin{math} f(y \mid \lambda, p) = (1 - p) \left[ \lambda/(1+\lambda) \right]^y \left[ 1/(1+\lambda) \right]^{1-y} + p \cdot \ind(y = 0). \end{math} which remains a Bernoulli distribution with an adjusted success probability. Here the $\lambda$ and $p$ parameters are not individually identifiable. Therefore, users may want to avoid zero-inflation in CMP data analyses with extreme underdispersion.
a. Finally, $p = 0$ yields $\text{CMP}(\lambda, \nu)$.
There is a close relationship between the CMP and ZICMP distributions, as we have seen from construction of ZICMP. The relationship between the CMP densities and variate generation mechanisms was given earlier in this section. Denote $F(x \mid \lambda, \nu)$ and $F(x \mid \lambda, \nu, p)$ as the CDFs of $\text{CMP}(\lambda, \nu)$ and $\text{ZICMP}(\lambda, \nu, p)$, respectively. We have \begin{align} F(x \mid \lambda, \nu, p) = (1-p) F(x \mid \lambda, \nu) + p \cdot \ind(x \geq 0). \end{align} For a given probability $\phi \in [0,1]$, the associated CMP and ZICMP quantile functions are related via \begin{align} F^{-}(\phi \mid \lambda, \nu, p) &= \inf{ x \in \mathbb{N} : F(x \mid \lambda, \nu, p) \geq \phi } \nonumber \ &= \inf{ x \in \mathbb{N} : (1-p) F(x \mid \lambda, \nu) + p \cdot \ind(x \geq 0) \geq \phi } \nonumber \ &= \inf{ x \in \mathbb{N} : F(x \mid \lambda, \nu) \geq (\phi - p) / (1 - p) } \nonumber \ &= F^{-}\left( \frac{\phi - p}{1 - p} \mid \lambda, \nu \right). \label{eqn:zicmp-quantiles} \end{align}
The respective functions for ZICMP density, variate generation, CDF, and quantile functions are dzicmp
, rzicmp
, pzicmp
, and qzicmp
. They make use of the CMP implementation described in Section \ref{sec:cmp} such as the criteria to either truncate or approximate the normalizing constant.
qq = seq(0, 0.95, length.out = 20) rzicmp(20, lambda = 1.5, nu = 0.2, p = 0.25) dzicmp(c(0, 1, 2), lambda = 1.5, nu = 0.2, p = 0.25) pzicmp(c(0, 1, 2), lambda = 1.5, nu = 0.2, p = 0.25) qzicmp(qq, lambda = 1.5, nu = 0.2, p = 0.25)
As with qcmp
, the qzicmp
function is computed by the truncation method, and cannot accurately compute quantiles very close to 1. More specifically, from the CMP distribution, \eqref{eqn:cmp-quantile-limit} gives
\begin{align}
M &> F^{-}(1 - \epsilon \mid \lambda, \nu) \
&= F^{-}\left( \frac{\phi_\epsilon - p}{1 - p} \mid \lambda, \nu \right) \
&= F^{-}\left( \phi_\epsilon \mid \lambda, \nu, p \right)
\end{align}
where $\phi_\epsilon = (1 - \epsilon)(1-p) + p$ and the last equality uses \eqref{eqn:zicmp-quantiles}. This motivates a warning from qzicmp
when the argument is larger than $\phi_\epsilon$.
tryCatch({ qzicmp(0.9999999, lambda = 1.5, nu = 0.5, p = 0.5) }, warning = print_warning)
Let us repeat the sanity check from Section \ref{sec:cmp-dist} to ensure that the empirical density values, cumulative probabilities, and quantiles line up with the ones computed via dzicmp
, pzicmp
, and qzicmp
.
library(ggplot2) n = 100000 lambda = 0.5 nu = 0.1 p = 0.5 x = rzicmp(n, lambda, nu, p) xx = seq(-1, max(x)) ## Include -1 to ensure it gets probability zero qq = seq(0, 0.99, length.out = 100) fx = dzicmp(xx, lambda, nu, p) px = pzicmp(xx, lambda, nu, p) qx = qzicmp(qq, lambda, nu, p) qx_emp = quantile(x, probs = qq)
ggplot() + geom_bar(data = data.frame(x = x), aes(x = x, y = ..prop..), fill = "NA", col = "black") + geom_point(data = data.frame(x = xx[-1], fx = fx[-1]), aes(x, fx)) + ylab("Density") + theme_bw()
ggplot() + stat_ecdf(data = data.frame(x = x), aes(x), geom = "step") + geom_point(data = data.frame(x = xx, px = px), aes(x, px)) + ylab("Probability") + theme_bw()
ggplot() + geom_point(data = data.frame(x = qq, qx_emp = qx_emp), aes(qq, qx_emp), pch = 1) + geom_point(data = data.frame(x = qq, qx = qx), aes(qq, qx), pch = 3) + xlab("Probability") + ylab("Quantile") + theme_bw()
The expected value and variance of $Y \sim \text{ZICMP}(\lambda, \nu, p)$ are apparent from the construction $Y = (1-S) T$ given earlier in this section. Namely,
\begin{align}
\E(Y) = (1-p) \E(T)
\quad \text{and} \quad
\Var(Y) = (1-p) \left{ \Var(T) + p[\E(T)]^2 \right}
\end{align}
may be obtained using formulas for iterated conditional expectations and variances. They are evaluated in COMPoissonReg
using the ezicmp
and vzicmp
functions respectively. These functions make use of the ecmp
and vcmp
functions described in Section \ref{sec:cmp-ev} to compute $\E(T)$ and $\Var(T)$.
ezicmp(lambda = 1.5, nu = 0.5, p = 0.1) ezicmp(lambda = 1.5, nu = 0.5, p = c(0.1, 0.2, 0.5))
vzicmp(lambda = 1.5, nu = 0.5, p = 0.1) vzicmp(lambda = 1.5, nu = 0.5, p = c(0.1, 0.2, 0.5))
\label{sec:reg}
Suppose there are $n$ subjects with outcomes $y_1, \ldots, y_n \in \mathbb{N}$ and covariates $\vec{x}_i \in \mathbb{R}^{d_1}$, $\vec{s}_i \in
\mathbb{R}^{d_2}$, and $\vec{w}_i \in \mathbb{R}^{d_3}$ for $i = 1, \ldots, n$. The COMPoissonReg
package fits both CMP and ZICMP regression models.
The CMP regression model assumes that
\begin{align}
Y_i \indep \text{CMP}(\lambda_i, \nu_i), \quad i = 1, \ldots, n,
\end{align}
where $\log \lambda_i = \vec{x}i^\top \vec{\beta}$ and $\log \nu_i = \vec{s}_i^\top \vec{\gamma}$. Writing $\btheta = (\vec{\beta}, \vec{\gamma})$, the likelihood is
\begin{align}
L(\btheta) =
\prod{i=1}^n \left[
\frac{\lambda_i^{y_i}}{(y_i!)^{\nu_i} Z(\lambda_i, \nu_i)}
\right].
\label{eqn:likelihood-cmp}
\end{align}
The ZICMP regression model assumes that
\begin{align}
Y_i \indep \text{ZICMP}(\lambda_i, \nu_i, p_i), \quad i = 1, \ldots, n,
\end{align}
where $\log \lambda_i = \vec{x}i^\top \vec{\beta}$, $\log \nu_i = \vec{s}_i^\top \vec{\gamma}$, and $\logit p_i = \vec{w}_i^\top \vec{\zeta}$. Writing $\btheta = (\vec{\beta}, \vec{\gamma}, \vec{\zeta})$, the likelihood is
\begin{align}
L(\btheta) =
\prod{i=1}^n \left[(1 - p_i)
\frac{\lambda_i^{y_i}}{(y_i!)^{\nu_i} Z(\lambda_i, \nu_i)}
+ p_i \ind(y_i = 0)
\right].
\label{eqn:likelihood-zicmp}
\end{align}
We will write $d = d_1 + d_2 + d_3$ for the total dimension of $\btheta$. The $n \times d_1$ design matrix whose rows consist of $\vec{x}_i$ will be denoted $\vec{X}$. Similarly, we will write $\vec{S}$ and $\vec{W}$ as the $n \times d_2$ and $n \times d_3$ design matrices constructed from $\vec{s}_i$ and $\vec{w}_i$, respectively. The glm.cmp
function provides a formula interface to fit models of both types: \eqref{eqn:likelihood-cmp} and \eqref{eqn:likelihood-zicmp}.
out = glm.cmp(formula.lambda, formula.nu = ~ 1, formula.p = NULL, data = NULL, init = NULL, fixed = NULL, control = NULL, ...)
The interface contains three formulas: formula.lambda
specifies the regression $\vec{x}_i^\top \vec{\beta}$ used for $\lambda_i$, while formula.nu
and formula.p
correspond to $\vec{s}_i^\top \vec{\gamma}$ for $\nu_i$ and $\vec{w}_i^\top \vec{\zeta}$ for $p_i$, respectively. ZICMP regression is utilized when formula.p
is set to something other than its default NULL
value; otherwise, CMP regression is assumed. The data
argument is used to pass a data.frame
explicitly rather than having the data be read from the local environment. The init
, fixed
, and control
arguments and associated helper functions are described below.
The init
argument represents an initial value for the optimizer. The following functions can be used to construct it.
get.init(beta = c(1, 2, 3), gamma = c(-1, 1), zeta = c(-2, -1)) get.init.zero(d1 = 3, d2 = 2, d3 = 2)
The fixed
argument is used to specify indices of the three coefficients which will remain fixed at their initial value during optimization.
get.fixed(beta = c(1L, 2L), gamma = c(1L))
The specification above requests the first two elements of beta
and the first element of gamma
to be fixed. Notice that indices must be integers and that the default value is an empty integer vector which is interpreted as "no elements are fixed". The fixed
argument can usually be disregarded but may be useful in some circumstances; an example is given in Section \ref{sec:zicmp-reg}.
Specifying the elements of init
and fixed
may be somewhat awkward with the formula interface, as they require knowledge of how formulas will be expanded into design matrices and coefficients. It can be helpful to produce the design matrices using R's model.matrix
function.
model.matrix(formula.lambda, data = data) model.matrix(formula.nu, data = data) model.matrix(formula.p, data = data)
The control
argument has been introduced in Section \ref{sec:cmp}; regression modeling makes use of several additional arguments. COMPoissonReg
uses optim
to compute the maximum likelihood estimate (MLE) $\hat{\btheta}$ for $\btheta$ under the specified model. Several controls are provided to influence how COMPoissonReg
invokes optim
; here are their default values.
control = getOption("COMPoissonReg.control") control$optim.method control$optim.control
The element optim.method
is a string which is passed as the method
argument to optim
, while optim.control
is a list passed as the control
argument of optim
. Note that, for the latter, if an entry is given for fnscale
, it will be ignored and overwritten internally by COMPoissonReg
.
The covariance of $\hat{\btheta}$ is estimated by $\hat{\vec{V}}(\hat{\btheta}) = -[\vec{H}(\hat{\btheta})]^{-1}$, where $\vec{H}(\btheta) = \partial^2 \log L(\btheta) / [\partial \btheta \partial \btheta^\top]$ is the Hessian of the log-likelihood computed by optim
. The standard error for coefficient $\theta_j$ in $\btheta$ is then obtained as the square root of the $j$th diagonal of $\hat{\vec{V}}(\hat{\btheta})$.
We will now illustrate use of the regression tools using two examples whose data are included in the package. Note that these demonstrations are not intended to be complete regression analyses, and results may be slightly different than previously published analyses due to differences in the computations.
\label{sec:cmp-reg}
\label{sec:cmp-reg-freight}
The freight
dataset [@KutnerEtAl2003] was analyzed using CMP regression by @SellersShmueli2010 and found to exhibit underdispersion. The data describe $n = 10$ instances where 1,000 ampules were transported via air shipment. The outcome of interest is the variable broken
which describes the number of broken ampules in each shipment. The covariate transfers
describes the number of times the carton was transferred from one aircraft to another during the shipment.
Let us load and view the dataset.
data(freight) print(freight)
Before fitting a CMP regression, let us fit a Poisson regression model $Y_i \indep \text{Poisson}(\lambda_i)$ with
\begin{align}
\log \lambda_i = \beta_0 + \beta_1 \cdot \text{transfers}_i.
\end{align}
This can be carried out with the standard glm
function.
glm.out = glm(broken ~ transfers, data = freight, family = poisson) summary(glm.out)
\label{sec:cmp-reg-initial}
Next, let us fit a similar CMP regression model with \begin{align} &\log \lambda_i = \beta_0 + \beta_1 \cdot \text{transfers}_i, \ &\log \nu_i = \gamma_0, \end{align} using only an intercept for $\nu_i$.
cmp.out = glm.cmp(broken ~ transfers, data = freight) print(cmp.out)
The coefficients used in the $\lambda_i$ formula are prefixed with an X:
label, while an S:
label is used for coefficients of the $\nu_i$ formula. Notice that estimates for X:
coefficients from the CMP fit are dissimilar to those from the Poisson fit; this may occur when the estimate of $\nu$ deviates from the value of 1. Similarly to the glm
output, the output of glm.cmp
displays several quantities for each coefficient $\theta_j$, $j = 1, \ldots, d$: a point estimate $\hat{\theta}_j$, an associated standard error $\widehat{\text{SE}}(\hat{\theta}_j)$, a z-value $z_j = \hat{\theta}_j / \widehat{\text{SE}}(\hat{\theta}_j)$, and a p-value $2\Phi(-|z_j|)$ for the test $H_0: \theta_j = 0$ versus $H_1: \theta_j \neq 0$. Here, $\Phi$ is the CDF of the standard normal distribution. Because an intercept-only formula was specified for $\nu_i$, $\hat{\nu} = \exp(\hat{\gamma})$ does not vary with $i$ and its estimate and associated standard error are added to the display. Here we see evidence of underdispersion with $\hat{\nu} > 1$. A test for equidispersion is displayed to determine whether there is a significant amount of over or underdispersion in the data. In particular, a likelihood ratio test is used to decide whether $H_0: \vec{\gamma} = \vec{0}$ versus $H_0: \vec{\gamma} \neq \vec{0}$. The test statistic is displayed along with the degrees of freedom and associated p-value. Here we have fairly strong evidence to reject the null hypothesis of equidispersion.
The AIC for the CMP model cmp.out
shows some improvement over that of glm.out
. Let us also consider a slope coefficient for the $\nu$ component using
\begin{align}
\log \nu_i = \gamma_0 + \gamma_1 \cdot \text{transfers}_i.
\end{align}
via the following call to glm.cmp
.
cmp2.out = glm.cmp(broken ~ transfers, formula.nu = ~ transfers, data = freight) print(cmp2.out)
Model cmp2.out
provides a slight improvement over cmp.out
in AIC and BIC, but we may prefer cmp.out
in the interest of simplicity.
\label{sec:cmp-reg-optim}
To gain some insight into the optimization, we may wish to increase the trace level, which can be done as follows.
control = get.control(optim.control = list(maxit = 5, trace = 3, REPORT = 1)) cmp3.out = glm.cmp(broken ~ transfers, data = freight, control = control)
Data from the local environment may be passed to the glm.cmp
function without explicitly using the data
argument.
y = freight$broken x = freight$transfers glm.cmp(y ~ x)
\label{sec:cmp-reg-offset}
In a count regression model, it may be desirable to include offset terms such as
\begin{align}
\log \lambda_i = \vec{x}_i^\top \vec{\beta} + \text{offx}_i, \quad
\log \nu_i = \vec{s}_i^\top \vec{\beta} + \text{offs}_i.
\end{align}
An offset
term may be used in the formula interface to accomplish this.
freight$offx = 13 freight$offs = 1 glm.cmp(broken ~ transfers + offset(offx), data = freight) glm.cmp(broken ~ transfers + offset(offx), formula.nu = ~1 + offset(offs), data = freight)
For users who wish to bypass the formula interface and prepare the $\vec{X}$ and $\vec{S}$ design matrices manually, a "raw" interface to the regression functionality is also provided.
y = freight$broken X = model.matrix(~ transfers, data = freight) S = model.matrix(~ 1, data = freight) offs = get.offset(x = rep(13, nrow(freight)), s = rep(1, nrow(freight))) cmp.raw.out = glm.cmp.raw(y, X, S, offset = offs)
\label{sec:cmp-reg-accessors}
Several accessors are provided to extract results from the output object.
logLik(cmp.out) ## Log-likelihood evaluated at MLE. AIC(cmp.out) ## AIC evaluated at MLE. BIC(cmp.out) ## BIC evaluated at MLE. coef(cmp.out) ## Estimates of theta as a flat vector coef(cmp.out, type = "list") ## Estimates of theta as a named list vcov(cmp.out) ## Estimated covariance matrix of theta hat sdev(cmp.out) ## Standard deviations from vcov(...) diagonals sdev(cmp.out, type = "list") ## Standard deviations as a named list
The predict
function computes several useful quantities evaluated at the estimate $\hat{\btheta}$. The argument default type = "response"
produces a vector of estimates of the response $\hat{y}_i = \E(Y_i)$ for $i = 1, \ldots, n$ using the method described in Section \ref{sec:cmp-ev}. The argument type = "link"
produces a data.frame
with columns for the linked parameters $\lambda_i$ and $\nu_i$.
predict(cmp.out) predict(cmp.out, type = "link")
Note that the estimated nu
values are equal for all observations because the model assumed only an intercept term for the dispersion component.
We can also use predict
on new covariate values. Note that models fit with the formula interface expect the new data to be provided as a data.frame
which is interpreted using the formula used to fit the model. If the raw interface was used to fit the model, use the get.modelmatrix
function to specify design matrices to use for prediction.
# Prepare new data to fit by formula interface new.df = data.frame(transfers = 0:10) # Prepare new data to fit by raw interface X = model.matrix(~ transfers, data = new.df) S = model.matrix(~ 1, data = new.df) new.data = get.modelmatrix(X = X, S = S) # Pass new data to model from by formula interface y.hat.new = predict(cmp.out, newdata = new.df) # Pass new data to model from by raw interface y.hat.new = predict(cmp.raw.out, newdata = new.data) # Compute predictions for links predict.out = predict(cmp.out, newdata = new.df, type = "link") # Plot predictions ggplot() + geom_point(data = new.df, aes(transfers, y.hat.new)) + xlab("Number of transfers") + ylab("Predicted number broken") + theme_bw()
print(y.hat.new) print(predict.out)
The leverage
function computes the diagonal entries of a "hat" matrix which can be formulated in CMP regression. These can be used to diagnose influential observations. For details, see Section 3.6 of @SellersShmueli2010.
leverage(cmp.out)
The residuals
function provides either raw (the default) or quantile-based residuals [@DunnSmyth1996]. In a CMP regression setting, raw residuals $y_i - \hat{y}_i$ generally do not work well with traditional regression diagnostics, such as Q-Q plots. Quantile-based residuals often produce interpretable diagnostics; however, a random element is used in the computation of quantile residuals for discrete distributions. This aids interpretability but gives slightly different residual values each time they are computed. See @DunnSmyth1996 for details.
res.raw = residuals(cmp.out) res.qtl = residuals(cmp.out, type = "quantile")
Pearson residuals may be preferred over raw residuals for diagnostics; these can be obtained by standardizing raw residuals using leverage values and variance estimates.
link.hat = predict(cmp.out, type = "link") vv = vcmp(link.hat$lambda, link.hat$nu) hh = leverage(cmp.out) res.pearson = res.raw / sqrt(vv*(1-hh))
For each type of residual---raw, Pearson, and quantile---we now plot fitted values versus residuals and Q-Q plots.
plot.fit.res = function(y.hat, res) { ggplot(data.frame(y = y.hat, res = res)) + geom_point(aes(y, res)) + xlab("Fitted Value") + ylab("Residual Value") + theme_bw() + theme(plot.title = element_text(size = 10)) } plot.qq.res = function(res) { ggplot(data.frame(res = res), aes(sample = res)) + stat_qq() + stat_qq_line() + theme_bw() + theme(plot.title = element_text(size = 10)) } y.hat = predict(cmp.out) plot.fit.res(y.hat, res.raw) + ggtitle("Fitted Values vs. Raw Residuals") plot.qq.res(res.raw) + ggtitle("Q-Q Plot of Raw Residuals") plot.fit.res(y.hat, res.pearson) + ggtitle("Fitted Values vs. Pearson Residuals") plot.qq.res(res.pearson) + ggtitle("Q-Q Plot of Pearson Residuals") plot.fit.res(y.hat, res.qtl) + ggtitle("Fitted Values vs. Quantile Residuals") plot.qq.res(res.qtl) + ggtitle("Q-Q Plot of Quantile Residuals")
In this example, with only r nrow(freight)
observations, it is difficult to see an advantage of using quantile residuals; the benefit will be more apparent in Section \ref{sec:zicmp-reg}. One benefit of raw residuals is that they may be used to compute a mean-squared error.
mean(res.raw^2)
To access the results of the equidispersion test shown in the output of cmp.out
, we may use the equitest
accessor function.
equitest(cmp.out)
The deviance
function computes the deviance quantities $D_i = -2 [\log L_i(\hat{\btheta}) - \log L_i(\tilde{\btheta}i)]$ for $i = 1, \ldots, n$, where $L_i(\btheta)$ is the term of the likelihood corresponding to the $i$th observation, $\hat{\btheta}$ is the MLE computed under the full likelihood $L(\btheta) = \prod{i=1}^n L_i(\btheta)$, and $\tilde{\btheta}_i$ is the maximizer of $L_i(\btheta)$.
deviance(cmp.out)
The parametric.bootstrap
function carries out a parametric bootstrap with $R$ repetitions. Using the fitted MLE $\hat{\btheta}$, bootstrap samples $\vec{y}^{(r)} = (y_1^{(r)}, \ldots, y_n^{(r)})$ are drawn from the likelihood $L(\hat{\btheta})$ for $r = 1, \ldots, R$. Estimate $\hat{\btheta}^{(r)}$ is fitted from bootstrap sample $\vec{y}^{(r)}$. An $R \times d$ matrix is returned whose $r$th row is $\hat{\btheta}^{(r)}$.
cmp.boot = parametric.bootstrap(cmp.out, reps = 100) head(cmp.boot)
We used $R = r nrow(cmp.boot)
$ in the display above to keep vignette computations small, but a larger number may be desired in practice. Bootstrap repetitions can be used, for example, to compute 95% confidence intervals for each of the coefficients.
t(apply(cmp.boot, 2, quantile, c(0.025,0.975)))
\label{sec:cmp-reg-large-covariates}
Large covariates can present numerical difficulties in fitting CMP regression. We will briefly demonstrate the difficulties and some possible workarounds. First let us generate a new dataset based on a large covariate in the regression for $\lambda_i$.
set.seed(1234) n = 200 x = rnorm(n, 500, 10) X = cbind(intercept = 1, slope = x) S = matrix(1, n, 1) beta_true = c(-0.05, 0.05) gamma_true = 2 lambda_true = exp(X %*% beta_true) nu_true = exp(S %*% gamma_true) y = rcmp(n, lambda_true, nu_true)
Notice that the generated counts $y_1, \ldots, y_n$ are relatively small compared to the covariate $x_1, \ldots, x_n$.
summary(x) summary(y)
An initial attempt to fit the true data-generating model fails.
tryCatch({ glm.cmp(y ~ x, formula.nu = ~ 1) }, error = print_warning)
Internally, the linked rate parameter $\lambda_i = \exp(\beta_0 + \beta_1 x_i)$ may evaluate to Inf
or become very close to zero as the optimizer moves $\beta_1$ away from zero in a positive or negative direction, respectively. Some possible ways to address this are as follows.
Standardize the covariate to have mean zero and variance one.
glm.cmp(y ~ scale(x), formula.nu = ~ 1)
Use a logarithmic transformation on the covariate.
glm.cmp(y ~ log(x), formula.nu = ~ 1)
Change optimization method or other optim
arguments.
control = get.control(optim.method = "BFGS", optim.control = list(maxit = 200)) suppressWarnings({ cmp.out = glm.cmp(y ~ x, formula.nu = ~ 1, control = control) print(cmp.out) })
In this case, standardization and logarithmic transformation produce a usable fit. Changing the optimization method to BFGS
allows the optimization to finish, but there are further numerical problems in computing the Hessian for standard errors.
\label{sec:cmp-reg-large-outcomes}
Now consider a generated dataset with large outcomes but a relatively small covariate. This situation can also present numerical difficulties.
set.seed(1234) n = 200 x = runif(n, 1, 2) X = cbind(intercept = 1, slope = x) S = matrix(1, n, 1) beta_true = c(1, 1) gamma_true = -0.95 lambda_true = exp(X %*% beta_true) nu_true = exp(S %*% gamma_true) y = rcmp(n, lambda_true, nu_true)
summary(x) summary(y)
An initial attempt to fit the data-generating model fails.
tryCatch({ glm.cmp(y ~ x, formula.nu = ~ 1) }, error = print_warning)
Informative starting values help the optimizer to initially make progress. True data-generating parameters will not be available in a real data analysis situation but help to illustrate the idea.
init = get.init(beta = beta_true, gamma = gamma_true) glm.cmp(y ~ x, formula.nu = ~ 1, init = init)
Without choosing an initial value, changing the optimization method to Nelder-Mead
and increasing the maximum number of iterations also helps the optimizer find a solution.
control = get.control(optim.method = "Nelder-Mead", optim.control = list(maxit = 1000)) glm.cmp(y ~ x, formula.nu = ~ 1, control = control)
Note that this solution is different from the previous one; the log-likelihood of the previous one is slightly better.
\label{sec:zicmp-reg}
\label{sec:zicmp-reg-couple}
The couple
dataset [@LoeysEtAl2012] was analyzed with ZICMP regression in @SellersRaim2016 and found to exhibit overdispersion. The data concern separation trajectories of $n = 387$ couples. The variable UPB
records the number of unwanted pursuit behavior perpetrations and is considered the outcome of interest. Included covariates are the binary variable EDUCATION
, which is 1 if at least a bachelor's degree was attained, and a continuous variable ANXIETY
which measures anxious attachment. A zero-inflated count model is considered for these data because 246 of the 387 records have an outcome of zero.
Let us load and view the first few records in the dataset.
data(couple) head(couple)
As a preliminary model, let us fit a standard Poisson model $Y_i \indep \text{Poisson}(\lambda_i)$ with
\begin{align}
\log \lambda_i = \beta_0 + \beta_1 \cdot \text{EDUCATION}_i + \beta_2 \cdot \text{ANXIETY}_i.
\end{align}
We may use the standard glm
function.
glm.out = glm(UPB ~ EDUCATION + ANXIETY, data = couple, family = poisson) summary(glm.out)
\label{sec:zicmp-reg-initial}
Now consider a ZICMP regression with
\begin{align}
&\log \lambda_i = \beta_0 + \beta_1 \cdot \text{EDUCATION}_i + \beta_2 \cdot \text{ANXIETY}_i, \
&\log \nu_i = \gamma_0, \
&\logit p_i = \zeta_0 + \zeta_1 \cdot \text{EDUCATION}_i + \zeta_2 \cdot \text{ANXIETY}_i.
\end{align}
We use the glm.cmp
function as follows.
zicmp0.out = glm.cmp(UPB ~ EDUCATION + ANXIETY, formula.nu = ~ 1, formula.p = ~ EDUCATION + ANXIETY, data = couple) print(zicmp0.out)
There are now three sets of coefficients reported in the output: the X:
, S:
, and W:
prefixes label estimates for the $\lambda_i$, $\nu_i$, and $p_i$ formulas respectively.
The AIC of the ZICMP model is drastically smaller than the Poisson model, indicating a greatly improved fit. However, there are signs of possible numerical issues. The estimate for $\gamma_0$ is a large negative number, but with an extremely large associated SE, which suggests that the effect may not be statistically significant. On the other hand, the estimate of $\nu$ is nearly zero with a small SE, which suggests that the dispersion parameter is indeed statistically significant. On the surface, this seems to be a contradiction.
The issue is that the Hessian of the log-likelihood becomes insensitive to small changes in $\gamma_0$ when $\lambda_i < 1$ and $\gamma_0$ is a large negative number. Let us first verify that the estimates for $\lambda_i$ are indeed smaller than 1.
pred.out = predict(zicmp0.out, type = "link") summary(pred.out$lambda)
To show the insensitivity of the Hessian, let us consider a simpler setting with $Y \sim \text{CMP}(\lambda, \nu)$, $\lambda = \exp{-0.25} \approx 0.7788$ fixed, and $\nu = \exp{\gamma_0}$. We then have log-density \begin{align} \log f(y \mid \gamma_0) = y \log \lambda - e^{\gamma_0} \log(y!) - \log Z(\lambda, e^{\gamma_0}), \end{align} with first derivative and second derivatives, respectively, \begin{align} &\frac{\partial}{\partial \gamma_0} \log f(y \mid \gamma_0) = -e^{\gamma_0} \log(y!) - \frac{\partial}{\partial \gamma_0} \log Z(\lambda, e^{\gamma_0}), \ &\frac{\partial^2}{\partial \gamma_0^2} \log f(y \mid \gamma_0) = -e^{\gamma_0} \log(y!) - \frac{\partial^2}{\partial \gamma_0^2} \log Z(\lambda, e^{\gamma_0}). \end{align} For a given value of $y$, $-e^{\gamma_0} \log(y!)$ approaches zero as $\gamma_0$ decreases. Therefore, let us focus on the function ( g(\gamma_0) = -\log Z(\lambda, e^{\gamma_0}) ) and its first and second derivatives. The following code illustrates their behavior.
library(numDeriv) g = function(gamma0) { -ncmp(lambda = exp(-0.25), nu = exp(gamma0), log = TRUE) } dat = data.frame(gamma0 = seq(0, -13), g = NA, d_g = NA, d2_g = NA) for (j in 1:nrow(dat)) { gamma0 = dat$gamma0[j] dat$g[j] = g(gamma0) dat$d_g[j] = numDeriv::grad(func = g, x = gamma0) dat$d2_g[j] = numDeriv::hessian(func = g, x = gamma0) }
Here is the result.
print(dat)
Notice that $g(\gamma_0)$ approaches a limit as $\gamma_0 \rightarrow -\infty$, which coincides with the CMP distribution approaching a Geometric distribution. It may not be surprising that the first and second derivatives approach zero accordingly. This explains the large SE for $\gamma_0$ in the model zicmp0.out
. With estimates tending to this region of the parameter space, it may be preferable to fix fix $\gamma_0$ at a value such as $-\infty$, which will be done in the next section.
\label{sec:zicmp-reg-fixed}
Our attempt to fit the previous model strongly tended to the Zero-Inflated Geometric special case of ZICMP, but SEs computed via the Hessian become large in this region. In this section, we fix $\gamma_0$ at the extreme $-\infty$ and fit the remaining coefficients. Let us use the raw interface to do this.
init = coef(zicmp0.out, type = "list") y = couple$UPB X = model.matrix(~ EDUCATION + ANXIETY, data = couple) S = model.matrix(~ 1, data = couple) W = model.matrix(~ EDUCATION + ANXIETY, data = couple) control = get.control(optim.method = "BFGS") zicmp.out = glm.zicmp.raw(y, X, S, W, init = get.init(beta = c(-1,0,0), gamma = -Inf, zeta = c(-1,0,0)), fixed = get.fixed(gamma = 1L), control = control)
print(zicmp.out)
Notice that an additional Fixed
column has been added to the display, indicating that the coefficient gamma
is fixed. Furthermore, its Estimate
column is set to the initial value and the columns SE
, z-value
, and p-value
are set to NA
. This model achieves a similar log-likelihood value as our first attempt using L-BFGS-B
but does not exhibit signs of numerical issues.
\label{sec:zicmp-reg-accessors}
Here are several of the accessors provided to extract model outputs.
logLik(zicmp.out) ## Log-likelihood evaluated at MLE AIC(zicmp.out) ## AIC evaluated at MLE BIC(zicmp.out) ## BIC evaluated at MLE coef(zicmp.out) ## Estimates of theta as a flat vector coef(zicmp.out, type = "list") ## Estimates of theta as a named list vcov(zicmp.out) ## Estimated covariance matrix of theta hat sdev(zicmp.out) ## Standard deviations from vcov(...) diagonals sdev(zicmp.out, type = "list") ## Standard deviations as a named list equitest(zicmp0.out) ## Likelihood ratio test for H_0: gamma = 0 tryCatch({ ## An error is thrown for model with fixed gamma equitest(zicmp.out) }, error = print_warning)
Because we fixed $\gamma = -\infty$ to obtain zicmp0.out
, the equitest
function throws an error instead of proceeding with an equidispersion test.
The predict
function behaves similarly as in CMP regression; however, the link
type here also includes a column with the estimated $p_i$.
y.hat = predict(zicmp.out) ## Fitted values based on ecmp link.hat = predict(zicmp.out, type = "link") head(y.hat) head(link.hat)
In this example, we can see the benefit of using quantile residuals rather than raw residuals for diagnostic plots. The functions plot.fit.res
and plot.qq.res
have been defined in Section \ref{sec:cmp-reg}.
res.raw = residuals(zicmp.out, type = "raw") res.qtl = residuals(zicmp.out, type = "quantile") plot.fit.res(y.hat, res.raw) + ggtitle("Fitted Values vs. Raw Residuals") plot.qq.res(res.raw) + ggtitle("Q-Q Plot of Raw Residuals") plot.fit.res(y.hat, res.qtl) + ggtitle("Fitted Values vs. Quantile Residuals") plot.qq.res(res.qtl) + ggtitle("Q-Q Plot of Quantile Residuals")
Here is an example of computing fitted values for new covariate data.
new.df = data.frame(EDUCATION = round(1:20 / 20), ANXIETY = seq(-3,3, length.out = 20)) X.new = model.matrix(~ EDUCATION + ANXIETY, data = new.df) S.new = model.matrix(~ 1, data = new.df) W.new = model.matrix(~ EDUCATION + ANXIETY, data = new.df) new.data = get.modelmatrix(X.new, S.new, W.new) # For model fit using raw interface, use get.modelmatrix to prepare new design # matrices, offsets, etc y.hat.new = predict(zicmp.out, newdata = new.data) # For models fit with the formula interface, pass a data.frame with the same # structure as used in the fit. y.hat.new = predict(zicmp0.out, newdata = new.df)
print(y.hat.new)
As with CMP regression, a parametric.bootstrap
function is provided for convenience to obtain a bootstrap sample $\hat{\btheta}^{(r)}$ of $\btheta$ based on the estimate $\hat{\btheta}$. Because it is too time consuming to run this example within the vignette, we show the code without output below. As in Section \ref{sec:cmp-reg}, we consider using the bootstrap samples to construct a 95% confidence interval for each of the coefficients.
zicmp.boot = parametric.bootstrap(zicmp.out, reps = 100) head(zicmp.boot) apply(zicmp.boot, 2, quantile, c(0.025,0.975))
We acknowledge Thomas Lotze for significant contributions to the initial development of the COMPoissonReg
package and for service as initial maintainer on CRAN. We thank Darcy Steeg Morris, Eric Slud, and Tommy Wright for reviewing the manuscript. We are grateful to the users of COMPoissonReg
for their interest and for bringing to light some of the issues which have been considered in this work.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.