The moultmcmc package implements the regression models outlined in @underhill1988model and @underhill1990model. In their notation^[Note that in @underhill1988model the variable $t$ is doubly defined. It is both a generic variable of time in the model derivation, and denotes the sample dates of pre-moult birds in the data likelihoods], samples consist of $I$ pre-moult birds, $J$ birds in active moult, and $K$ post-moult birds. Birds in each category are observed on days $t = t_1,\ldots,t_I$; $u = u_1,\ldots,u_J$; $v = v_1,\ldots,v_K$, respectively. Moult scores for actively moulting birds, where available, are encoded as $y = y_1,\ldots,y_J$.

Each moult state has a probability of occurrence $$ \begin{aligned} P(t) &= \Pr{Y(t)=0}&= &1-F_T(t)\ Q(t) &= \Pr{0<Y(t)<1}& =&F_T(t)-F_T(t-\tau)\ R(t) &= \Pr{Y(t)=1}&= &1-F_T(t-\tau)\ \end{aligned} $$ Further, assuming a linear progression of the moult indices over time, the probability density of a particular moult score at time $t$ is $$ f_Y(t)(y)=\tau f_T(t-y\tau),\quad0 < y < 1, $$ In moultmcmc the unobserved start date of the study population is assumed to follow a normal distribution with mean $\mu$ and standard deviation $\sigma$, such that $$F_T(t)=\Phi\left(\frac{t-\mu}{\sigma}\right)$$ where $\Phi$ is the standard normal distribution function and $$f_T(t) = \phi(t) = \frac{1}{\sqrt{2\pi}}\exp\frac{-t^2}{2}$$.

We further assume that $F_T(t)$ has $p$ parameters $\mathbf{\theta} = \theta_1, \theta_2, \ldots, \theta_p$ and for convenience the start date $\mu$, duration $\tau$, and population standard deviation of moult $\sigma$ will be elements of $\mathbf{\theta}$.

Likelihoods

Type 1

Type 1 data consist of observations of categorical moult state (pre-moult, active moult, post-moult) and sampling is representative in all three categories. The likelihood of these observations is $$ \mathcal{L}(\boldsymbol\theta,t,u,v) = \prod_{i=1}^IP(t_i)\prod_{j = 1}^JQ(u_j)\prod_{k=1}^KR(v_k). $$

Type 2

Type 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult). Sampling is representative for all three categories, and for actively moulting birds a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known. The likelihood of these observations is $$ \mathcal{L}(\boldsymbol\theta,t,u,y,v) = \prod_{i=1}^IP(t_i)\prod_{j = 1}^Jq(u_j,y_j)\prod_{k=1}^KR(v_k), $$ where $q(u_j,y_j) = \tau f_T(u_j-y_j\tau).$

Lumped Type 2

Lumped type 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult), but where the pre-moult and post-moult states cannot be distinguished from each other, yielding $I$ observations of dates $t_i$ on which non-moulting birds were observed. Sampling is representative for all three categories, and for actively moulting birds a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known. The likelihood of these observations is $$ \mathcal{L}(\boldsymbol\theta,t,u,y) = \prod_{i=1}^IPR(t_i)\prod_{j = 1}^Jq(u_j,y_j), $$ where $q(u_j,y_j) = \tau f_T(u_j-y_j\tau)$ and $PR(t_i)=P(t_i)+R(t_i).$

Type 3

Type 3 data consist of observations of actively moulting birds only, and a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known for each individual. The likelihood of these observations is $$ \mathcal{L}(\boldsymbol\theta,u,y) = \prod_{j = 1}^J\frac{q(u_j,y_j)}{Q(u_j)}. $$

Type 4

Type 4 data consist of observations of birds in active moult and post-moult only. Sampling is representative for these two categories, and for actively moulting birds a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known. The likelihood of these observations is $$ \mathcal{L}(\boldsymbol\theta,u,y,v) = \prod_{j = 1}^J\frac{q(u_j,y_j)}{1-P(u_j)}\prod_{k=1}^K\frac{R(v_k)}{1-P(v_k)}. $$

Type 5

Type 5 data consist of observations of birds in pre-moult and active moult. Sampling is representative for these two categories, and for actively moulting birds a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known. The likelihood of these observations is $$ \mathcal{L}(\boldsymbol\theta,t,u,y) = \prod_{i=1}^I\frac{P(t_i)}{1-R(t_i)}\prod_{j = 1}^J\frac{q(u_j,y_j)}{1-R(u_j)}. $$

Type 1 + 2

As outlined in @underhill1988model estimates can also be derived from mixtures of data types. Type 1 + 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult). Sampling is representative for all three categories, but a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known only for some of the actively moulting birds. This means the sample consist of $I$ pre-moult birds, $J$ birds in active moult with known indices, $L$ birds in active moult without known indices but known capture dates $u'=u'l,\ldots,u'_L$, and $K$ post-moult birds. The likelihood of these observations is $$ \mathcal{L}(\boldsymbol\theta,t,u,y,u',v) = \prod{i=1}^IP(t_i)\prod_{j = 1}^Jq(u_j,y_j)\prod_{l = 1}^{L}Q(u'{l})\prod{k=1}^KR(v_k), $$

Recapture models

moultmcmc currently implements a recaptures model which allows for heterogeneity in start dates $\mu$ but assumes a common moult duration $\tau$. When repeat observations are available an individual's start date $\mu_n$ then becomes

\begin{equation} \mu_n = \mu_0 + \mu'n + \mathbf{x}\mu\boldsymbol{\beta}_\mu \end{equation}

where $\boldsymbol{x}\mu$ is a row vector containing the values of individual-specific predictors (in the same format as $\boldsymbol{X}\mu$), and $\mu'_n$ is an individual-level random effect intercept

\begin{equation} \mu'n \sim \mathrm{Normal}(0,\sigma_n) \end{equation} where $\sigma_n$ is the individual-specific standard deviation. We can then exploit the linearity assumption and treat observed moult scores as \begin{equation} y{ni} \sim \mathrm{Normal}(\mu_0 + \mu'n + \tau * u{ni}, \sigma_\tau) \end{equation} where $\sigma_\tau$ captures any unmodelled variance in $\tau$ as well as any measurement error in $y$.

The likelihood for the Type 3-like model for a sample of $J$ birds in active moult without repeated observations, and $N$ birds in active moult with a total of $M$ repeated observations $u'=u'_m,\ldots,u'_M$ and $y'=y'_m,\ldots,y'_M$ then is

$$ \mathcal{L}(\boldsymbol\theta,u,y,u',y') = \prod_{j = 1}^J\frac{q(u_j,y_j)}{Q(u_j)}\prod_{m = 1}^{M}f(u'm,y'_m)\prod{n=1}^N\phi(\mu'_n|0,\sigma_n), $$ where $f(u'_m,y'_m)$ follows from above.

Priors

Users have a choice between two set of priors for the intercept terms of the linear predictors on the start date $\mu$, the duration $\tau$, and the population standard deviation of the start date $\sigma$, respectively. By default flat priors are used for $\mu_0$ and $\tau_0$ and a vaguely informative normal prior on $\ln(\sigma_0)$

$\mu_0 \sim \mathrm{Uniform(0,366)}$
$\tau_0 \sim \mathrm{Uniform(0,366)}$
$\ln(\sigma_0) \sim \mathrm{Normal(0,5)}$

In some cases the models sample poorly with these priors, and better convergence can be achieved by setting the argument flat_prior = FALSE. In this case vaguely informative truncated normal priors are used for $\mu_0$ and $\tau_0$:

$\mu_0 \sim \mathrm{TruncNormal(150,50,0,366)}$
$\tau_0 \sim \mathrm{TruncNormal(100,30,0,366)}$

These priors work well for data from passerines in seasonal environments, i.e. when the sampling occasion data is encoded as days from mid-winter.

For any additional regression coefficients an improper flat prior is used as a default.

References



pboesu/moultmcmc documentation built on Feb. 19, 2023, 6:03 a.m.