This section describes distributions and likelihoods that I have used or found useful to have as reference. These are the less frequent compared with the Normal and Multinomial etc.
The density for the Gamma is parameterised either by the shape ($k$) and scale ((\theta)) or the shape ((\alpha)) and rate parameter ((\beta)). Many of the models I apply to represent the Gamma (and general model frameworks such as GLM) have coefficients that define the moments i.e., mean $E[Y] = \mu = k\theta$ and variance $Var[Y] = \mu^2/\phi$, where (\phi = 1/k) is the dispersion coefficient (estimable). These moments can be reparameterised back to the original Gamma parameters. The likelihood for the Gamma follows \begin{equation} l = \prod \frac{1}{\Gamma(k) \theta^{k}} x_i^{k - 1} e^{x_i/\theta} \end{equation} and log-likelihood \begin{equation} ll = \sum -log(\Gamma(k)) - klog(\theta) + (k - 1)log(x_i) + {x_i/\theta} \end{equation}
so calculating (\theta) and (k) from (\mu) and (Var).
[k = \frac{\mu}{\theta} ] and then substituting (k) into the variance definition to solve for (\theta)
[Var[Y] = \mu^2/\phi = \mu\theta]
Given (\phi) is an estimable parameter we want to re-arrange the above to make (\theta) a function of (\mu) and (\phi).
[\theta = \frac{\mu}{\phi}]
# credit for this code https://stat.ethz.ch/pipermail/r-help/2011-July/283736.html set.seed(1001) shape = 2 scale = 3 z <- rgamma(10000, shape = shape ,scale = scale) var(z) # shoudl be close to shape * scale^2 # calculate CV 1/sqrt(shape) sd(z) / mean(z) # compare to cv #> [1] 1.002399 ## inverse link GLM g1 <- glm(z ~ 1, family = Gamma(link = "inverse")) #Here the intercept estimate is 0.167 , which is very #nearly 1/6 = 1/(shape*scale) -- i.e. the Gamma GLM is #parameterized in terms of the mean (on the inverse scale). # If you want to recover the scale parameter for the intercept case, then summary(g1)$dispersion / coef(g1)[1] ## to get the scale parameter the log-link gamma model g2 <- glm(z ~ 1, family = Gamma(link = "log")) ## scale param summary(g2)$dispersion * exp(coef(g2)[1]) ## shape param 1 / summary(g2)$dispersion # the shape parameter (the reciprocal of the dispersion parameter)
The negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of iid Bernoulli trials before a specified (non-random) number of successes (denoted (r)) occurs, with probability of success denoted by ((p)).
\begin{equation} y \sim NB(\mu, \phi) \end{equation}
where (\mathbb{E}\left[y\right] = \mu), (Var\left[y\right] = \mu \left(1 +\frac{ \mu}{\phi}\right)) and constrained to be (\phi > 0). Link the parameters back to the traditional \begin{equation} y \sim NB(r, p) \end{equation}
where, (r) is the number of failures until the experiment is stopped, and (p) is the success probability in each trial.
[ r = \frac{\mu^2}{\sigma^2 - \mu} ]
[ p = 1 - \frac{\sigma^2 - \mu}{\sigma^2} = \frac{\phi}{\phi + \mu} ]
set.seed(123) ## generate nbinomial draw n_sim = 100 phi = 100 # also = r mu = rlnorm(n_sim, log(5), 0.4) var = mu + (mu^2/phi) p_success = phi/(phi+mu) ## probability of success in each trial. p_fail = (var - mu) / var ## probability of failure in each trial. head(cbind(p_success, p_fail, 1 - p_success)) ## r = mu^2 / (var - mu) ## = phi n = mu * (p_success / p_fail) summary(r) summary(p_fail) ## simulates the number of failures which occur in a sequence of Bernoulli ## trials before a target number of successes is reached y_sim = rnbinom(n_sim, size = phi, mu = mu) ## Calculate the densities the two ways sum(dnbinom(y_sim, size = phi, mu = mu, log = T)) sum(dnbinom(y_sim, size = n, prob = p_success, log = T))
If ((p_1,\dots,p_k) \sim \mathrm{Dirichlet}(\alpha_1,\dots,\alpha_k)) and ((x_1,\dots,x_k) \sim \mathrm{Multinomial}(n, p_1,\dots,p_k)), then ((x_1,\dots,x_k) \sim \mathrm{DirichletMultinomial(n, \alpha_1,\dots,\alpha_k)}).
The probability density function is \begin{equation} f(x) = \frac{\left(n!\right)\Gamma\left(\sum \alpha_k\right)}{\Gamma\left(n+\sum \alpha_k\right)}\prod_{k=1}^K\frac{\Gamma(x_{k}+\alpha_{k})} {\left(x_{k}!\right)\Gamma(\alpha_{k})} \end{equation}
with dispersion parameter (\beta)
When simulating from this distribution you first simulate the dirichlet variable, using independent gamma draws (normalised), with shape parameter set as the fitted proportion. Then draw from the multinomial with expected composition based on the dirichlet draw.
The Dirichlet-Multinomial using the linear re-parametrised approach from @thorson2017model. For the observed proportions at age $\tilde{\pi}_b$ for composition bin $b$ (age or length), with sample size $n$, expected proportions for the same bin denoted by $\pi_b$, and estimable overdispersion parameter (\theta) the negative log-likelihood is:
\begin{align} -\log \left(L \right) &= -\log \Gamma \left(n + 1 \right) + \sum\limits_b \log \left( \Gamma \left(n\tilde{\pi}_b + 1\right) \right) + \ & \ \ \ \log \Gamma \left(\theta n\right) + \log \Gamma \left(n + \theta n\right) n\tilde{\pi}_b - \sum\limits_b \log (n\tilde{\pi}_b + \theta n \pi_b) - \log(\theta n \pi_b) \end{align} with, (\beta = n \theta) which has an effective sample size (n_{eff}) [ n_{eff} = \frac{1 + \theta n}{1 + \theta} = \frac{1}{1 + \theta} + n\frac{\theta}{1 + \theta} ] where effective sample size is a linear function of input sample size with intercept ((1 + \theta)^{-1}) and slope (\frac{\theta}{1 + \theta}). Interpreting (\theta) is if (\theta) is large then (n_{eff} \rightarrow N) and if (\theta \ll N ) and (N > 1) then (\theta) can be interpreted as the ratio of effective sample size over input sample size.
#' Return the Pdf for the dirichlet multinomial tried to copy from The #' thorson paper. He deviates from the classic formulation by pulling out (x!) #' of the denominator from the right hand product #' and moving it to the left denominator #' @param: obs vector of observed compositions assumes sum(pi) = 1 #' @param: fitted vector of fiited compositions assumes sum(pi_fitted) = 1 #' @param: beta variance inflation coefficient #' @param: n is the total number of samples in the available data (which is restricted to any non-negative real number), #' @return PDF #' ddirichmult = function(obs, beta, n, fitted, log = F) { if(sum(pi) != 1) stop("pi needs to sum to 1") if(sum(fitted) != 1) stop("pi_fitted needs to sum to 1") val = (lgamma(n + 1) + lgamma(beta)) - (lgamma(n + beta) + sum(lgamma(n*obs + 1))) + sum(lgamma(n*obs + beta * fitted) - lgamma(beta* fitted)) if(log == F) val = exp(val) return(val) } #' returns the negative log-likelihood for the dirichlet multinomial #' this is used in the TMB code, parameterised as theta. ddirichmult_alt = function(obs, theta, fitted) { N_eff = sum(obs) obs_prop = obs / N_eff sum1 = sum2 = 0 for(i in 1:length(obs)) { sum1 = sum1 + lgamma(N_eff * obs_prop[i] + 1); sum2 = sum2 + lgamma(N_eff * obs_prop[i] + theta * N_eff * fitted[i]) - lgamma(theta * N_eff * fitted[i]) } nll = (lgamma(N_eff + 1) - sum1 + lgamma(theta * N_eff) - lgamma(N_eff + theta * N_eff) + sum2); return(nll) } ## run an example set.seed(123) n = 1000 ## sample size pi = rnorm(10, 50, 10) pi_fitted = rnorm(10, 50, 10) pi = pi / sum(pi) pi_fitted = pi_fitted / sum(pi_fitted) size = n x = pi * n beta = 7 theta = beta / n ## evaluate different functions ddirichmult_alt(obs = x, theta = beta / sum(x), fitted = pi_fitted) ddirichmult(obs = x / sum(x), n = sum(x), fitted = pi_fitted, beta = beta, log = T)
A key data type in age and length-structured stock assessment models is composition data. The default in error structure for normalized (sum to 1) composition data sets is the multinomial [@francis2014replacing]. However there are often complex correlations in residuals when applying the multinomial distribution [@berg2016accounting; @francis2014replacing]. This make the additive logistic normal and multinomial centered log transform attractive, because the transformed compositions are assumed to be multivariate random variables. This allows for complex correlation structures to be explored (See next section for details on correlations)
Normalised composition data can be viewed as a simplex. Consider a specific simplex, the so-called (D - 1) standard simplex, subset of (\mathbb{R}^D), which is defined by
\begin{equation}\label{eq:simplex} \tilde{S}^D = \boldsymbol{x} = (x_1, x_2, \dots, x_D) \in \mathbb{R}^D | x_i \geq 0, \sum\limits_{i = 1}^D x_i = 1 \end{equation}
The additive log ratio (ALR) transformation maps a simplex (\tilde{S}^D) to (\mathbb{R}^{D-1}), and the result for an observation (\boldsymbol{x} \in \tilde{S}^D) are coordinates (\boldsymbol{y} \in \mathbb{R}^{D - 1}) with
\begin{equation}\label{eq:alr_transform} \boldsymbol{y} = (y_1,y_2,\dots,y_{D - 1})' = alr(\boldsymbol{x}) = \left(ln\frac{x_1}{x_j}, \dots, ln\frac{x_{j-1}}{x_j},ln\frac{x_{j + 1}}{x_j}\right) \end{equation}
The index (j \in [1,\dots, D]) refers to the variable that is chosen as ratioing variable in the coordinates. This choice usually depends on the context, but also on the suitability of the results for data exploration. The main disadvantage of alr is the subjective choice of the ratioing variable.
The centred log ratio (CLR) transformation maps a simplex (\tilde{S}^D) to (\mathbb{R}^{D}), and the result for an observation (\boldsymbol{x} \in \tilde{S}^D) are coordinates (\boldsymbol{y} \in \mathbb{R}^{D}) with
\begin{equation}\label{eq:clr_transform} \boldsymbol{y} = clr(\boldsymbol{x}) = (y_1, \dots,y_D)'= \left(ln\frac{x_1}{\sqrt[D]{\prod\limits_{k = 1}^Dx_k}} , \dots, ln\frac{x_D}{\sqrt[D]{\prod\limits_{k = 1}^Dx_k}}\right) \end{equation}
[@francis2014replacing] (Appendix) suggests defining a multivariate normal variable (\boldsymbol{X} \sim \mathcal{MVN} \left(\boldsymbol{\mu}, \boldsymbol{\Sigma}\right)) in [@schnute2007compositional] (\boldsymbol{\mu} = log(E)) and (\boldsymbol{\Sigma}) is the variance. With the relationship to the observed composition (\boldsymbol{O} = (O_1,\dots, O_D)')
\begin{equation}\label{eq:schnute} O_i = \frac{e^{X_i}}{\sum_i e^{X_i}} \end{equation}
Francis suggests using the two transformations, the ALR (Equation~\ref{eq:alr_transform}) to define a new variable (\boldsymbol{Y} = alr(\boldsymbol{X})), and (\boldsymbol{Y} \sim \mathcal{MVN}\left(alr(\boldsymbol{E}), \boldsymbol{V}\right)),
[ \boldsymbol{V} = \boldsymbol{K}\boldsymbol{\Sigma}\boldsymbol{K}' ]
with (\boldsymbol{K}) has dims (D - 1 \times D) which is an identity matrix (I_{D-1}) with an additional vector of (\boldsymbol{-1}) to the right.
The other transformation is the centred log ratio transformation (Equation~\ref{eq:clr_transform}), with variable (\boldsymbol{Z} \sim \mathcal{MVN}\left(clr(\boldsymbol{E}), \boldsymbol{\Gamma}\right))
[ \boldsymbol{\Gamma} = \boldsymbol{F}'\boldsymbol{H}^{-1}\boldsymbol{V}\boldsymbol{H}^{-1} \boldsymbol{F} ]
where, ( \boldsymbol{F}) is the same structure as (\boldsymbol{K}) but instead of a vector of (\boldsymbol{-1}) the are positive (\boldsymbol{1}), (\boldsymbol{H} =\boldsymbol{I}_{D-1} + 1).
The benefits of using the ALN, is that you can estimate the covariance with respect to the composition, the negative is that the transformed random variable (\boldsymbol{Y}) has (D - 1) bins and so relating residuals can be confusing to interpret. This is compared to CLT transformation (\boldsymbol{Z}) which has (D) bins for the transformed random variable, but the covariance (\boldsymbol{\Gamma}) is singular when using the above formula's. The other options is to construct (\boldsymbol{\Gamma}) from estimated parameters, but then it's hard to interpret the variance and correlation parameters, back to the original composition dataset.
We are interested in estimating correlation and covariance structures for variable (X(t)) which is a function of time (t). The aim is to estimate the covariance of over multiple time horizons (cov(X(t),X(s)) denoted as (\boldsymbol{\Sigma}_X). We look at four correlation structures, AR(1), Compound symmetry, unstructured and independence. Also how the parameters are parametrised and estimated. Standard deviation is estimated separately to the correlation structures. A good resource for TMB, specifically for glmmTMB is here. Another good resource regarding different correlation structures can be found here \subsection{Independence} The simplest and only has a single estimable parameter. \begin{equation} cov(X(t),X(s)) = \sigma^2 \end{equation}
This is a first-order autoregressive structure with homogenous variances. \begin{equation} cov(X(t),X(s)) = \sigma^2 \rho^{\lvert t - s\rvert} \end{equation}
so the correlation structure \begin{equation} \boldsymbol{\Sigma}_{\rho} = \begin{pmatrix} 1 & \rho & \rho^2 & \rho^3 \ & 1 & \rho & \rho^2 \ & & 1 & \rho \ & & & 1 \end{pmatrix} \end{equation} where, (\rho) is bound between -1 and 1. To enforce this transformation a transformed parameter (\phi \in \mathbb{R}) is estimated which is unbounded, with the relationship to (\rho) being,
\begin{equation}\label{eq:rho_transform} \rho = \phi / \sqrt(1 + \phi^2); \end{equation}
\begin{figure}[H] \centering \includegraphics[scale=0.6]{../Figures/rho_transform.jpeg} \caption{Example of the relationship of transformed and untransformed parameter from Equation~\ref{eq:rho_transform}} \label{fig:rho_transform} \end{figure}
This covariance structure has heterogenous variances and constant correlation between elements, the correlation parameter is estimated as the logistic function so is unbounded defined by (\phi) and mapped back to ([0,1]) space using the inverse logistic function (Equation~\ref{eq:invlogit}), where (\rho) is generated as, if there are (n) maximum unit spacings.
\begin{align} a =& \frac{1}{n - 1}\ \rho =& logit^{-1}\left(\phi\right) \left(1 + a\right) + a \end{align}
\begin{equation}\label{eq:invlogit} logit^{-1}\left(x\right) = \frac{1}{1 + e^{-x}} \end{equation}
so the correlation structure looks like \begin{equation} \boldsymbol{\Sigma}_{\rho} = \begin{pmatrix} 1 & \rho & \rho^2 & \rho^3 \ & 1 & \rho & \rho^2 \ & & 1 & \rho \ & & & 1 \end{pmatrix} \end{equation}
but with the heterogenenous variance the covariance is \begin{equation} \boldsymbol{\Sigma}_{X} = \begin{pmatrix} \sigma^2_1 & \sigma_1\sigma_2\rho & \sigma_1\sigma_3\rho^2 & \sigma_1\sigma_4\rho^3 \ & \sigma^2_2 & \sigma_2\sigma_3\rho & \sigma_2\sigma_4\rho^2 \ & & \sigma^2_3 & \sigma_3\sigma_4\rho \ & & & \sigma^2_4 \end{pmatrix} \end{equation}
This assumption has, (n(n - 1 )/ 2) parameters, which represent the lower triangle of the cholesky decomposition of the correlation matrix.
\begin{equation}\label{eq:us_corre} \boldsymbol{\Sigma}_{\rho} = D^{-0.5}LL'D^{-0.5} \end{equation}
where, \begin{equation} L = \begin{pmatrix} 1 & & & \ \theta_{1,1} & 1 & & \ \theta_{2,1} & \theta_{2,2} & 1 & \ \theta_{3,1} & \theta_{3,2} & \theta_{3,3} & 1 \end{pmatrix} \end{equation}
with \begin{equation}\label{eq:D_dag} D = diag(LL') \end{equation}
In TMB (\boldsymbol{\theta}) is parametrised as unbounded parameters where for a single parameter of (L) (\theta_i), this works out to confirmed here \begin{equation}\label{eq:us_rho} \rho = \frac{\theta_i}{1 + \theta_i^2} \end{equation}
The next natural step would be to reduce the number of parameters by collecting correlation parameters within the same off-diagonal. This amounts to (n - 1) correlation parameters and (n) variance parameters. The diagonal elements are all approximately equal to the true total variance (2\sigma^2AR+2\sigma_0^2=2), and the off-diagonal elements are approximately equal to the expected value of (0.7/2=0.35.)
This follows from work by @cadigan2015state, who looked at censored likelihoods to account for under/over reported catch. This is in the context of fitting to catch when estimating Fs from the baranov catch equation.
#' #' Follows Cadigan (2016) "A state-space stock assessment model for northern cod #' including under-reported and variable nvatural mortality rates" Equation~7. #' This function shows the log-likelihood surface between bounds for a range of #' parameters and catches posfun = function(x, eps) { denom = 2; denom = denom - x/eps; ans = ifelse(x > eps, x, eps/denom); penalty = ifelse(x > eps, 0.0, 0.01 * (x - eps) * (x - eps)); return(list(new_x = ans, pen = penalty)) } #' catches<vector> catches to look at the profile over #' upper<scalar> upper bound for censored likelihood #' lower<scalar> lower bound for censored likelihood #' sigma<scalar> standard deviation for censored likelihood #' delta<scalar> delta the minium to look at catch_ll = function(catches, upper, lower, sigma, delta) { ll = vector() for(i in 1:length(catches)) { diff = pnorm(log(upper / catches[i]) / sigma) - pnorm(log(lower / catches[i]) / sigma) ad_buffer = posfun(diff, delta) ll[i] = log(ad_buffer$new_x) + ad_buffer$pen } return(ll) } ## run through L_y = 3000 U_y = 4000 sigma = 0.01 delta = 0.025 catches = seq(2000,5000, by = 1) ll = catch_ll(catches, upper = U_y, lower = L_y, sigma, delta) plot(catches, ll, type = "l", lwd = 3, lty = 1, xlab = "Catches", ylab = "Log-likelihood") #' Issues with the above "The log-likelihood can be very small even for values of Cy that are slightly #' outside of the bounds, and this can lead to problems when optimisizing the likelihood." #' Add a mixture #' #' catch_ll_mixture = function(catches, upper, lower, sigma, prob, sigma_astrix, delta) { ll = vector() for(i in 1:length(catches)) { diff = (prob * pnorm(log(upper / catches[i]) / sigma) + (1 - prob) * pnorm(log(upper / catches[i]) / sigma_astrix)) - (prob * pnorm(log(lower / catches[i]) / sigma) + (1 - prob) * pnorm(log(lower / catches[i]) / sigma_astrix)) ad_buffer = posfun(diff, delta) ll[i] = log(ad_buffer$new_x) + ad_buffer$pen } return(ll) } sigma_astrix = 0.075 prob = 0.9 ll_mix = catch_ll_mixture(catches, upper = U_y, lower = L_y, sigma, prob, sigma_astrix, delta) par(mfrow= c(2,1), oma = c(3,3,1,1), mar = c(2,1,1,1), cex.lab = 2, cex.axis = 1.5) plot(catches, ll, type = "l", lwd = 5, lty = 1, xlab = "", ylab = "", ylim = c(-5, 0.5), xlim = c(2250, 4800)) lines(catches, ll_mix, lty = 2, lwd = 5, col = "red") legend('topright', cex = 1.2, legend = c(expression(paste(sigma[c], " = ", 0.01)), expression(paste(sigma[c], "* = ", 0.075)), expression(paste(L[c], " = ", 3000)), expression(paste(U[c], " = ", 4000)), expression(paste(p, " = ", 0.9)), expression(paste(epsilon, " = ", 0.025)))) #legend("topright", legend = c("Original", "Mixture"), col = c("black","red"), lwd = 5, lty = c(1,2)) ## is the top of flat catch_ndx = catches > (L_y + 200) & catches < (U_y - 300) polygon(x = c(min(catches[catch_ndx]), min(catches[catch_ndx]), max(catches[catch_ndx]),max(catches[catch_ndx])), y = c(-0.4, 0.4, 0.4, -0.4), lwd = 3, lty = 2) plot(catches[catch_ndx], ll[catch_ndx], type = "l", lwd = 5, ylim = c(-0.04, 0.01), ylab = "", xlab = "", xlim = c(min(catches[catch_ndx]), max(catches[catch_ndx]))) lines(catches[catch_ndx], ll_mix[catch_ndx], lwd = 5, col = "red", lty = 2) legend("bottomright", legend = c("Original", "Mixture"), col = c("black","red"), lwd = 5, lty = c(1,2)) mtext(side = 2, outer = T, adj = 0.5, las = 3, text = "Log-likelihood", line = 1.5, cex = 2) mtext(side = 1, outer = T, adj = 0.5, las = 1, text = "Catches", line = 1.5, cex = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.