Bayesian inference methods

knitr::opts_chunk$set(
  fig.align = "center",
  fig.width = 6, fig.height = 6,
  cache = FALSE,
  collapse = TRUE,
  comment = "#>",
  highlight = TRUE
)

Bernoulli distribution with probit link function

Model definition

According to the article @Albert1993, a possible model is to assume the existence of an underlying latent variable related to our observed binary variable using the following proposition :

$$ \begin{aligned} &z_{ij} = \alpha_i X_i'\beta_j+ W_i'\lambda_j + \epsilon_{ij},\ &\text{ with } \epsilon_{ij} \sim \mathcal{N}(0,1) \ \forall ij \text{ and such as : } \ &y_{ij}= \begin{cases} 1 & \text{ if } z_{ij} > 0 \ 0 & \text{ otherwise.} \end{cases} \end{aligned} \Rightarrow
\begin{cases} y_{ij}| z_{ij} \sim \mathcal{B}ernoulli(\theta_{ij}) \text{ with } \ \theta_{ij} = \Phi(\alpha_i + X_i'\beta_j+ W_i'\lambda_j) \ \text{where } \Phi \text{ correspond to the repartition function} \ \text{of the reduced centred normal distribution.} \end{cases} $$

$$\begin{aligned} \mathbb{P}(y_{ij}=1) & = \mathbb{P}(z_{ij} > 0)\ & = \mathbb{P}(\alpha_i + X_i'\beta_j + W_i'\lambda_j + \epsilon_{ij} > 0)\ & = \mathbb{P}(\epsilon_{ij} > - (\alpha_i + X_i'\beta_j + W_i'\lambda_j) \ ) \ & = \mathbb{P}(\epsilon_{ij} \leq \alpha_i + X_i'\beta_j + W_i'\lambda_j) \ & = \Phi( \alpha_i + X_i'\beta_j + W_i'\lambda_j) \ \end{aligned}$$

In the same way:

$$\begin{aligned} \mathbb{P}(y_{ij}=0) & = \mathbb{P}(z_{ij} \leq 0)\ & = \mathbb{P}(\epsilon_{ij} \leq - (\alpha_i + X_i'\beta_j + W_i'\lambda_j) \ ) \ & = \mathbb{P}(\epsilon_{ij} > \alpha_i + X_i'\beta_j + W_i'\lambda_j) \ & = 1 - \Phi( \alpha_i + X_i'\beta_j + W_i'\lambda_j) \ \end{aligned}$$

with the following parameters and priors :

Conjugate priors

Fixed species effects

We go back to a model of the form: $Z' = X\beta + \epsilon$ to estimate the posterior distributions of betas, lambdas and latent variables $W_i$ of the model. For example concerning $\lambda_j$, we define $Z'{ij} = Z{ij} - \alpha_i - X_i'\beta_j$ such as $Z'{ij} = W_i'\lambda_j + \epsilon{ij}$ so $Z'_{ij} \ | \ W_i \ , \ \lambda_j \ \sim \mathcal{N}( W_i'\lambda_j, 1)$.

In this case we can use the following proposition:

$$\begin{cases} Y \ | \ \beta &\sim \mathcal{N}_n ( X\beta, I_n) \ \beta &\sim \mathcal{N}_p (m,V) \end{cases} \Rightarrow \begin{cases} \beta|Y &\sim \mathcal{N}_p (m^,V^) \text{ with } \ m^ &= (V^{-1} + X'X)^{-1}(V^{-1}m + X'Y)\ V^&=(V^{-1} + X'X)^{-1} \end{cases}$$.

$$\begin{aligned} p(\beta \ | \ Y) & \propto p(Y \ | \ \beta) \ p(\beta) \ & \propto \frac{1}{(2\pi)^{\frac{n}{2}}}\exp\left(-\frac{1}{2}(Y-X\beta)'(Y-X\beta)\right)\frac{1}{(2\pi)^{\frac{p}{2}}|V|^{\frac{1}{2}}}\exp\left(-\frac{1}{2}(\beta-m)'V^{-1}(\beta-m)\right) \ & \propto \exp\left(-\frac{1}{2}\left((\beta-m)'V^{-1}(\beta-m) + (Y-X\beta)'(Y-X\beta)\right)\right) \ & \propto \exp\left(-\frac{1}{2}\left(\beta'V^{-1}\beta + m'V^{-1}m - m'V^{-1}\beta -\beta'V^{-1}m + Y'Y + \beta'X'X\beta - Y'X\beta - \beta'X'Y\right)\right) \ & \propto \exp\left(-\frac{1}{2}\left(\beta'(V^{-1}+X'X)\beta -\beta'(V^{-1}m + X'Y) - (Y'X + m'V^{-1})\beta + m'V^{-1}m + Y'Y \right)\right) \ & \propto \exp\left(-\frac{1}{2}\left(\beta'(V^{-1}+X'X)\beta -\beta'(V^{-1}m + X'Y) - (X'Y + V^{-1}m)'\beta + m'V^{-1}m + Y'Y \right)\right) \ & \propto \exp(-\frac{1}{2}\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)'(V^{-1}+X'X)\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)\ & \quad -(V^{-1}m + X'Y)'(V^{-1}+X'X)^{-1}(V^{-1}m + X'Y) +m'V^{-1}m + Y'Y)\ & \propto \exp\left(-\frac{1}{2}\left(\beta - \underbrace{(V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)}_{m^}\right)'\underbrace{(V^{-1}+X'X)}_{{V^}^{-1}}\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)\right) \end{aligned}$$

Actually, we use that proposition to estimate betas, lambdas and gammas if species traits data are provided.

Random site effects

About the posterior distribution of the random site effects $(\alpha_i){i=1,\dots,nsite}$, we can use a transformation of the form $Z'{ij} = \alpha_i + \epsilon_{ij}$, with $Z'{ij} = Z{ij} - W_i'\lambda_j - X_i'\beta_j$ so $Z'_{ij} \ | \ W_i, \ \lambda_j, \ \beta_j, \ \alpha_i \ \sim \mathcal{N}(\alpha_i,1)$. We then use the following proposition:

$$\begin{cases} x \ | \ \theta & \sim \mathcal{N}(\theta, \ \sigma^2) \ \theta & \sim \mathcal{N}(\mu_0,{\tau_0}^2) \ \sigma^2 & \text{ known} \end{cases} \Rightarrow \begin{cases} \theta | \ x &\sim \mathcal{N}(\mu_1,{\tau_1}^2) \text{ with }\ \mu_1 &= \dfrac{{\tau_0}^2\mu_0 + x\sigma^2}{{\tau_0}^{-2}+\sigma^{-2}} \ {\tau_1}^{-2} &={\tau_0}^{-2}+\sigma^{-2} \end{cases}$$.

$$\begin{aligned} p(\theta \ | \ x) & \propto p(x \ | \ \theta) \ p(\theta) \ & \propto \frac{1}{(2\pi\sigma^2)^{\frac{1}{2}}}\exp\left(-\frac{1}{2\sigma^2}(x-\theta)^2\right)\frac{1}{(2\pi{\tau_0}^2)^{\frac{1}{2}}}\exp\left(-\frac{1}{2{\tau_0}^2}(\theta-\mu_0)^2\right) \ & \propto \exp\left(-\frac{1}{2{\tau_0}^2}(\theta-\mu_0)^2-\frac{1}{2\sigma^2}(x-\theta)^2\right) \ & \propto \exp\left(-\frac{1}{2{\tau_0}^2}(\theta^2-2\mu_0\theta)-\frac{1}{2\sigma^2}(\theta^2-2x\theta)\right)\ & \propto \exp\left(-\frac{1}{2}\left(\theta^2 ({\tau_0}^{-2}+\sigma^{-2})-2\mu_0\theta{\tau_0}^{-2}-2x\theta\sigma^{-2}\right)\right)\ & \propto \exp\left(-\frac{1}{2({\tau_0}^{-2}+\sigma^{-2})^{-1}}\left(\theta^2 -2\theta \frac{\mu_0{\tau_0}^{-2}+ x\sigma^{-2}}{{\tau_0}^{-2}+\sigma^{-2}}\right)\right)\ \end{aligned}$$

Random site effect variance

Concerning posterior distribution of $V_{\alpha}$, the variance of random site effects $(\alpha_i){i=1,\dots,nsite}$, we use the following proposition :
If $$\begin{cases} x \ | \ \sigma^2 & \sim \mathcal{N}_n (\theta, \ \sigma^2I_n) \ \sigma^2 & \sim \mathcal{IG} (a,b) \ \theta & \text{ known} \end{cases} \Rightarrow \begin{cases} \sigma^2|x \sim \mathcal{IG}(a',b') \text{ with } \ a' = a + \frac{n}{2} \ b' = \frac{1}{2}\sum\limits
{i=1}^n(x_i-\theta)^2 + b. \end{cases}$$

$$\begin{aligned} p(\sigma^2 \ | \ x) & \propto p(x \ | \ \sigma^2) \ p(\sigma^2) \ & \propto \frac{1}{(2\pi\sigma^2)^{\frac{n}{2}}}\exp\left(-\frac{1}{2\sigma^2}(x-\theta)'(x-\theta)\right)\frac{b^a}{\Gamma(a)}{(\sigma^2)}^{-(a+1)}\exp\left(-\frac{b}{\sigma^2}\right) \ & \propto {(\sigma^2)}^{-\left(\underbrace{\frac{n}{2}+a}{a'}+1\right)}\exp\left(-\frac{1}{\sigma^2}\underbrace{\left(b+\frac{1}{2}\sum\limits{i=1}^n(x_i-\theta)^2\right)}_{b'}\right) \end{aligned}$$

Gibbs sampler principle

In the Bayesian framework, Gibbs' algorithm produces a realization of the parameter $\theta=(\theta_1,\ldots,\theta_m)$ according to the a posteriori law $\Pi(\theta \ | \ x)$ as soon as we are able to express the conditional laws: $\Pi(\theta_i | \theta_1,\dots,\theta_{i-1},\theta_{i+1},\ldots,\theta_m, x)$ for $i =1,\ldots,m$.

Gibbs sampling consists of:

Successive iterations of this algorithm generate the states of a Markov chain ${\theta^{(t)}, t > 0}$ to values in $\mathbb{R}^{m}$, we show that this chain admits an invariant measure which is the a posteriori law.

For a sufficiently large number of iterations, the vector $\theta$ obtained can thus be considered as a realization of the joint a posteriori law $\Pi(\theta \ | \ x)$.

Consequently, the implementation of a Gibbs sampler requires the knowledge of the a posteriori distributions of each of the parameters conditionally to the other parameters of the model, which can be deduced from the conjugated priors formulas in the case of the probit model but are not explicitly expressible in the case where a logit or log link function is used.

Gibbs sampler using conjuate priors

The algorithm used in jSDM_binomial_probit() function to estimate the parameters of the probit model is therefore as follows:

Initialize all parameters to $0$ for example, except the diagonal values of $\Lambda$ initialized at $1$ and $V_{\alpha}^{(0)}=1$.

Binomial distribution with logit link function

Model definition

In the same way as for the probit model, the logit model can be defined by means of a latent variable: $Z_{ij}= \alpha_i + X_i\beta_j + W_i\lambda_j + \epsilon_{ij}$ for $i=1,\ldots,I$ et $j=1,\ldots,J$, with $\epsilon_{ij} \sim \mathrm{logistique}(0,1)$ iid and such as: $$y_{ij}= \begin{cases} 1 & \text{ if } Z_{ij} > 0 \ 0 & \text{ else } \end{cases}$$ However in this case the a priori distributions of the latent variable and the parameters are not conjugated, we are not able to use the properties of the conjugated priors, so modelling using a latent variable is irrelevant.
In this case it is assumed that $$y_{ij} \ | \theta_{ij} \sim \mathcal{B}inomial(n_i,\theta_{ij})$$, with $\mathrm{probit(\theta_{ij})} = \alpha_i + X_i\beta_j+ W_i\lambda_j$ and $n_i$ the number of visits to the site $i$.
Therefore, the parameters of this model will be sampled by estimating their conditional a posteriori distributions using an adaptive Metropolis algorithm.

Priors used

An a priori distribution is determined for each parameter of the model :
$$\begin{array}{lll} V_{\alpha} & \sim & \mathcal {IG}(\text{shape}=0.5, \text{rate}=0.005) \text{ with } \mathrm{rate}=\frac{1}{\mathrm{scale}}, \ \beta_{jk} & \sim & \begin{cases} \mathcal{N}(\mu_{\beta_{jk}},V_{\beta_{k}}) \text{ for } j=1,\ldots,J \text{ and } k=0,\ldots,p, & \text{if species traits data are provided} \ \text{ where } \mu_{\beta_{jk}} = \sum_{r=0}^{nt} t_{jr}.\gamma_{rk} \text{ and } \gamma_{rk} \sim \mathcal{N}(\mu_{\gamma_{rk}},V_{\gamma_{rk}}) & \ \text{ for } r=0,\ldots,nt \text{ and } k=0,\ldots,p. & \ \mathcal{N}(\mu_{\beta_{k}},V_{\beta_{k}}) \text{ for } j=1,\ldots,J \text{ and } k=0,\ldots,p, & \text{if species traits data are not provided} \ \end{cases} \ \lambda_{jl} & \sim & \begin{cases} \mathcal{N}(\mu_{\lambda_{l}},V_{\lambda_{l}}) & \text{if } l < j \ \mathcal{N}(\mu_{\lambda_{l}},V_{\lambda_{l}}) \text{ left truncated by } 0 & \text{if } l=j \ P \text{ such as } \mathbb{P}(\lambda_{jl} = 0)=1 & \text{if } l>j \end{cases} \ \quad & \quad & \text{ for } j=1,\ldots,J \text{ and } l=1,\ldots,q. \end{array}$$

Adaptive Metropolis algorithm principle

This algorithm belongs to the MCMC methods and allows to obtain a realization of the parameter $\theta=(\theta_1,\ldots,,\theta_m)$ according to their conditional a posteriori distributions $\Pi(\theta_i | \theta_1,\dots,\theta_{i-1},\theta_{i+1},\ldots,\theta_m, x)$, for $i =1,\ldots,m$ known to within a multiplicative constant.
It is called adaptive because the variance of the conditional instrumental density used is adapted according to the number of acceptances in the last iterations.

Gibbs sampler using adaptative Metropolis algorithm

An adaptive Metropolis algorithm is used to sample the model parameters according to their conditional a posteriori distributions estimated to within one multiplicative constant.

First we define the $f$ function that calculates the likelihood of the model as a function of the estimated parameters:
$$ f : \lambda_j,\beta_j,\alpha_i, W_i, X_i, y_{ij},n_i \rightarrow f(\lambda_j,\beta_j,\alpha_i, W_i, X_i, y_{ij},n_i)=\mathrm{L}(\theta_{ij})$$ - Compute $\mathrm{logit}(\theta_{ij})= \alpha_i + X_i\beta_j + W_i\lambda_j$.

We repeat those steps for $i=1,\ldots,I$ et $j=1,\ldots,J$, and then we define $\theta = \left(\theta{ij}\right){i=1,\ldots I}^{j= 1,\ldots,J}$.
This allows us to calculate the likelihood of the model: $\mathrm{L}(\theta)= \prod\limits
{\substack{1\leq i\leq I \ 1 \leq j\leq I}}\mathrm{L}(\theta_{ij})$.

According to Bayes' formula we have $$\mathrm{p}(\theta \ | \ Y) \propto \Pi(\theta) \mathrm{L}(\theta).$$ We thus use the following relations to approach the conditional a posteriori densities of each of the parameters with $\Pi(.)$ the densities corresponding to their a priori laws. $$\begin{aligned} & p(\beta_{jk} \ | \ \beta_{j0},\beta_{j1},\ldots,\beta_{jk-1},\beta_{jk+1},\ldots,\beta_{jp}, \lambda_j,\alpha_1,\ldots,\alpha_I, W_1,\ldots,W_I,Y) \propto \Pi(\beta_{jk})\prod\limits_{1\leq i\leq I} \mathrm{L}(\theta_{ij})\ &p(\lambda_{jl} \ | \ \lambda_{j1},\ldots,\lambda_{jl-1},\lambda_{jl+1},\ldots,\lambda_{jq}, \beta_j,\alpha_1,\ldots,\alpha_I, W_1,\ldots,W_I,Y) \propto \Pi(\lambda_{jl}) \prod\limits_{1\leq i \leq I}\mathrm{L}(\theta_{ij})\ &p(W_{il} \ | \ W_{i1},\ldots,W_{il-1},W_{il+1},\ldots,W_{iq},\alpha_i,\beta_1,\ldots,\beta_J,\lambda_1,\ldots, \lambda_J,Y) \propto \Pi(W_{il}) \prod\limits_{1\leq j\leq J}\mathrm{L}(\theta_{ij})\ &p(\alpha_i \ | \ W_i,\beta_1,\ldots,\beta_J,\lambda_1,\ldots, \lambda_j,V_{\alpha},Y) \propto \Pi(\alpha_i \ | \ V_{\alpha}) \prod\limits_{1\leq j\leq J}\mathrm{L}(\theta_{ij})\ & \text{, for $i=1,\ldots,I$, $j=1,\ldots,J$, $k=1,\ldots,p$ and $l=1,\ldots,q$. } \end{aligned}$$

The algorithm implemented in jSDM_binomial_logit() on the basis of @Rosenthal2009 and @Roberts2001 articles, to estimate the parameters of the logit model is the following :

$$\gamma = min\left(1,\ \dfrac{\Pi\left(W_{il}^\star\right)\prod\limits_{1\leq j\leq J}f\left(W_{il}^\star, \alpha_i^{(t)},\beta_j^{(t-1)}, \lambda_j^{(t-1)},X_i,y_{ij},n_i\right)} {\Pi\left(W_{il}^{(t-1)}\right)\prod\limits_{1\leq j\leq J}f\left(W_{il}^{(t-1)}, \alpha_i^{(t)},\beta_j^{(t-1)}, \lambda_j^{(t-1)}, X_i,y_{ij},n_i\right)}\right).$$ * If species traits data are provided, generate the effects of species-specific traits on species' responses $\gamma^{(t)}=\left(\gamma_{rk}^{(t)}\right)^{r=0,\ldots,nt}{k=0,\ldots,p}$ such as : $$\gamma{rk}^{(t)} \ | \beta_{1k}^{(t-1)}, \ldots, \beta_{Jk}^{(t-1)} \sim \mathcal{N}(m^\star,V^\star) \text{, with }$$ $$m^\star = (V_{\gamma_{rk}}^{-1} + T_r'T_r)^{-1}(V_{\gamma_{rk}}^{-1}\mu_{\gamma_{rk}} + T_r\left(\beta_k^{(t-1)} - \sum\limits_{r' \neq r} T_{r'} \gamma_{r'k}^{(t-1)} \right) \text{ and } V^\star = \left(V_{\gamma_{rk}}^{-1}+ T_r'T_r\right)^{-1}.$$

$$\gamma = min\left(1,\dfrac{\Pi\left(\beta_{jk}^\star\right)\prod\limits_{1\leq i\leq I}f\left(\beta_{j0}^{(t)},\small{\ldots},\beta_{jk-1}^{(t)},\beta_{jk}^\star,\beta_{jk+1}^{(t-1)},\small{\ldots}, \beta_{jp}^{(t-1)},\lambda_j^{(t-1)}, \alpha_1^{(t)},W_1^{(t)},\small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)} {\Pi\left(\beta_{jk}^{(t-1)}\right)\prod\limits_{1\leq i\leq I}f\left(\beta_{j0}^{(t)},\small{\ldots},\beta_{jk-1}^{(t)},\beta_{jk}^{(t-1)},\beta_{jk+1}^{(t-1)},\small{\ldots}, \beta_{jp}^{(t-1)},\lambda_j^{(t-1)}, \alpha_1^{(t)},W_1^{(t)}, \small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)}\right).$$

Poisson distribution with log link function

Model definition

According to the article @Hui2016, we can use the Poisson distribution for the analysis of multivariate abundance data, with estimation performed using Bayesian Markov chain Monte Carlo methods.

In this case, it is assumed that $$y_{ij} \sim \mathcal{P}oisson(\theta_{ij})$$, with $\mathrm{log}(\theta_{ij}) = \alpha_i + X_i\beta_j+ W_i\lambda_j$.

We therefore consider abundance data with a response variable noted : $Y=(y_{ij})^{i=1,\ldots,nsite}_{j=1,\ldots,nsp}$ such as :

$$y_{ij}=\begin{cases} 0 & \text{if species $j$ has been observed as absent at site $i$}\ n & \text{if $n$ individuals of the species $j$ have been observed at the site $i$}. \end{cases}$$

Gibbs sampler using adaptative Metropolis algorithm

In this case, we cannot use the properties of the conjugate priors, therefore, the parameters of this model will be sampled by estimating their conditional a posteriori distributions using an adaptive Metropolis algorithm in the Gibbs sampler, in the same way as for the logit model.

We use the same algorithm as before by replacing the logit link function by a log link function and the binomial distribution by a poisson's law to calculate the likelihood of the model in the function jSDM_poisson_log().

References



Try the jSDM package in your browser

Any scripts or data that you put into this service are public.

jSDM documentation built on July 26, 2023, 5:21 p.m.