knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(escrocR)

Introduction

This packages aims at fitting the model ESCROC [@ballutaud_estimating_nodate]. ESCROC is a model that, similarly to clasical mixing models such as MIXSIAR [@stock_mixsiar_2013] estimates diet composition but works at the whole trophic network and simultaneously estimates contaminants biomagnification and isotopic enrichment in trophic network. ESCROC is a Bayesian model and as such allow for a rigorous consideration of sources of uncertainties such as measurement erros, missing data, interspecific variability, data censoring etc. To carry out the estimatation, ESCROC using measurements of concentration of contaminants and/or isotopic ratios made on a set of individuals from different species. Here, concentration of contaminants and isotopic ratios are considered to be tracers that can be used to inform on the diet of species: the signature of an individual (defined as the measured values for all tracers) is assumed to be a mixture of the signatures of its prey.

Prerequisites

The model is fitted using the JAGS sampler [@plummer_jags_2003]. Consequently, JAGS should be installed before fitting the model. Moreover, the package uses different R packages to facilitate R JAGS and interractions and to explore results. Therefore, following packages should be installed: runjags [@denwood_runjags_nodate], ggplot2 [@wickham_ggplot2_2016], coda [@plummer_coda_2010], reshape2 [@wickham_reshaping_2007] and corrplot [@wei_r_2017]. This can be done by running the following code in an R console.

install.packages(c("reshape2","coda","runjags","ggplot2","corrplot"))

Data preparation

Different types of data are needed to fit ESCROC:
- signature_data: a data frame that containes the signatures of all sampled individuals
- prior_diet_matrix: a matrix that species which species can eat which species
- LOQ: this matrix is used to specify the limit of detections or limit of quantifications to address censored data

Moreover, prior information can optionnally be used:
- prior_signature_data: a table to specify prior information on the signature of the species
- prior_delta: a table to specify prior information on isotopic enrichment or contaminant biomagnification.

The format of these different tables are described below.

Mandatory data

prior_diet_matrix {#prior_data_matrix}

prior_diet_matrix should be specified as a squared matrix with one row and one column per species in the studied trophic network. The names of the rows and the columns should correspond to the names of the species; please note that the names of the species should be written exactly in the same way in row names, column names, and in the signature_data and prior_signature_data tables.
In the matrix, each row corresponds to a predator. If the value is set to zero, this means that the predator (in row) can't eat the prey (in column). If the value is strictly superior to zero, the prey can potentially eat the predator. These positive values will be latter used to build priors on the diet compositions (in the absence of prior information, a good solution is to put a one when a predation is possible and a zero otherwise). You should also be aware that ESCROC can not handle tophic networks including loops (for example a prey should not eat its predator, or a predator of its predator...). The presence of the loop will be detected when preparing the data and an error will be thrown.

An example of matrix is provided in the package, coming from [@ballutaud_estimating_nodate]

data(prior_diet_matrix)
print(prior_diet_matrix)

In this example, Anchovy is thought to predate Mysids, Gammarids and Copepods; since they eat multiple species, we call this type of species "multiple consumer". Copepods do not eat any species, we call this type of species "source species". Mysids only eat Copepods, consequently their diet is known (no inference is required) and we call this type of species "single consumer". Source species, multiple consumers and single consumers will be automatically detected by the package when preparing the data.

signature_data {#signature_data}

This data.frame contains signatures measurements for all individuals. Each row corresponds to an indivual. The first column should be named "Species" and identify the species of all individuals. Species should match species from the prior_diet_matrix. The other columns correspond to the different tracers (as many as possible) and their names will be used to identify tracers in prior_delta and LOQ. Missing data or censored data should be specified as NA. Tracers should be transformed if required, to ensure that tracer value increase linearly with trophic level in average. The slope of this linear regression (isotopic enrichment for isotopic ratio, contaminant biomagnification for contaminants) is called delta in the rest of the document, consistently with Ballutaud et al. [-@ballutaud_estimating_nodate].

An example from Munoz et al. [-@munoz_evidence_2017] is provided in the package:

data(signature_data)
summary(signature_data)

In this example, 138 individuals were sampled from 16 species. You can check that the names of the species match the row and column names from prior_diet_matrix. Concentrations in 5 contaminants and isotopic ratio in carbon and nitrogen were measured on each indivudal. Some measurements were left censored, i.e. below the limit of quantification: 51 for PFOA, 10 for PFNA and 11 for NA. Concentrations in contaminants were log10 transformed since log concentration of these contaminants is assumed to increase linearly with trophic level.

LOQ {#LOQ}

This data frame should have the same numbers of row as signature_data (one per individual), and one column per contaminant. Names of the columns should match the names of the tracers in signature_data. The values correspond for the limit of the quantification / detection (transformed if required, see previous section) for each measurement. If a value is not missing in signature_data, then the corresponding value in LOQ should be smaller (this means that the measurement is above the limit of detection), an error will be thrown otherwise when preparing the data. For NA in signature_data, if a value is provided in LOQ then the model considers that the exact value is not known but is below the LOQ (a so called left censored data). If the value from signature_data is indeed a missing data, put a very large value as a LOQ to tell the model that you have no idea at all on the real value.

An example from Munoz et al. [-@munoz_evidence_2017] is provided in the package:

data(LOQ)
head(LOQ)

You can see that column names match column names from signature_data, and that contaminants LOQ were log10 transformed. Carbon and Nitrogen LOQ were not known but set to very small values since all measurements correponds to exact values.

Optionnal data

prior_delta {#prior_delta}

If information on the isotopic enrichement or on the contaminant biomagnification pre-exits (i.e. based on independant data sources), they can be used as priors in the analysis. It should be specified as a data.frame with the first column, named tracer, which indicated on which tracer we want to provide information (the values should be consistent with column names from signature_data), a column mean and a column standard deviation that will parametrize the normal prior use for the correponding delta.

prior_delta <- data.frame(tracer=c("X15N","X13C"),mean=c(3,0),sd=c(1,1))
prior_delta

In this example, based on existing literature, we decided to set two normal priors for nitrogen and carbon isotopic ratios with mean respecively 3 and 0, and standard deviations 1 for both. Since no priors are provided for contaminants, the model will automatically used uninformative priors.

prior_signature_data {#prior_signature_data}

For source species (i.e. species that do not eat any other species, such as Copepods, Gammarids, Nereis and Crabs in the current example), we can also provide prior information on their mean signatures (i.e. information coming from independent data: litterature, former experiments etc.). This is done by providing a data.frame, with a first column names "Species" (species names should once again match species names from diet_matrix), a second column names "tracer" (which should match column names from signature_data), then a column "mean" (mean value in pre-existing samples), "sd" (standard deviation in prexisting sample), and a last column "n" (number of individual in the pre-existing samples). The form of the corresponding prior is detailed in Ballutaud et al. [-@ballutaud_estimating_nodate]

Here we provide an example coming from David [-@david_dynamique_2006]

data(prior_signature_data)
prior_signature_data

Here we defined average signatures in carbon and nitrogen isotopic ratios for Copepods and Gammarids, based on 41 indviduals for Copepods (here we had the same values for nitrogen and carbon, but this is not compulsory) and 7 for Gammarids. For other tracers and source species, a non informative priors will be used.

Running the model and exploring results

Running the model

Once all data are available and formatted as detailed, the first step is to convert them into a format that can be understood by JAGS and that facilitate model fitting. This is done by using the function prepare_data, that returns a list containing data ready to be used by JAGS. The function also make some check (species and tracers names are consistent among column, absence of loops in the prior_diet_matrix) and returns informative error message if an error is detected.

Here is the code to prepare the data:

mydata <- prepare_data(prior_diet_matrix,signature_data,LOQ,prior_signature_data,prior_delta)

Note that the two last arguments are optionnal and should be provided only if prior information exists. mydata is a list that contains all the information on the observation and on the structure of the trophic network and which can then be used to build the model by itself, using the function building_model:

mymodel <- building_model(mydata)

The function returns a string that describes the model in JAGS language. The model is adapted to handle prior data if available. Once data is ready and the model built, the model can be fitted using the function fit_escroc which is the core function of the package.

myresults <- fit_escroc(mydata, mymodel, burnin=1000, adapt=500, sample=1000)

This function fits the model. Here we run the Markov Chain for 500 iteration as an adaptive phase, then 1000 iterations as burnin to ensure the chains converged. We put a too small number of iterations to have a fast running example (that's why a warning is thrown about the adaptive phase), and the number of iterations should be increased in a real case study to achieve a correct adaptation phase (adapt argument) and ensure chains convergence (burnin argument) before sampling.
The function returns an mcmc.list with varnames formatted to be self-explanatory. The result of a gelman test is also displayed, and values should be smaller than 1.05 if the chains have indeed converged. You can see that the chains have not converged because of our very small burnin numbers. You can also see that we got a NaN for the diet of Mysids: this is because Mysids is a single consumer and consequently, their diet is not estimated (but directly known).
To provide a more appropriate results, we included in the package a sample of results that was obtained with the following line of code:

myresults <- fit_escroc(mydata, mymodel,burnin=1e6,adapt=10000,sample=50000,thin=150)

The large thinning rate was chosen to limit the size of the package.

It can be retrieved that way:

data(myresults)

library(coda)
#running a gelman diagnostic
gelman.diag(myresults,multivariate=FALSE)

You can see that most gelman tests were well below 1.05 so that the chains have indeed converged.

Since the returned object is a mcmc.list, usual function from package coda can be used to explore the results, however we provide some functions to draw usual diagrams and extracts standard statistics.

Data exploration

Exploring diet

Diet species by species

A summary of the posterior distribution of a species, or a set of species is provided by the functions plot_diet_species and export_diet_species. For example, here we plot a diagram displaying the posterior diet composition of the Sole

print(plot_diet_species(mydata, myresults, "Sole")[[1]])

Here you see that Sole diet is dominated by Gammarids, and to a lesser extent by Brown shrimp. Results can also be displayed as a table of the quantiles of the posterior distribution:

#we extract the posterior samples of the Sole diet
sole_posterior <- export_diet_species(myresults,mydata,"Sole")[[1]]
head(sole_posterior)

#then we compute quantiles 2.5%, 50% and 97.5%
apply(sole_posterior,2,quantile,probs=c(0.025,.5,.975))

Diet of all species

A plot is proposed to the summarize the whole posterior diet matrix:

plot_full_diet_matrix(myresults,mydata)

In this matrix, predators are displayed in rows and preys is column. The color of the symbol stands for the median of posterior distribution of the contribution of a prey in the diet of a predator. For example, the blue color of the circle indicate that the median contribution of Mysids in Goby diet is about 0.6. The radius of the large circle stands for the 97.5%: for example the largest circle in the Anchovy x Copepods fills the whole cell, this means that the 97.5% quantile of the contribution of Copepods in Anchovy diet is close to 1. The radius ot the white circle inside the the coloured circle indicate the 2.5% quantile. Looking at Sprat x Copepods: the white circle is small so the quantile 2.5% is close to 30%, the colour is an rather dark blue so the quantile is about 0.7 and the largest circle is very big so the quantile 97.5%: this indicates that the posterior distribution is quite very large.

Plotting delta

It is also worthwile to get information on delta, i.e. biomagnification and isotopic enrichment. Summary of the posterior distribution can be obtained by doing:

library(coda)
summary(myresults[,grep("delta",varnames(myresults)),drop=FALSE])

We can also plot the posterior distribution using the function plot_delta_tracers. Remember that some tracers are sometimes transformed to comply with a linear regression between tracer and trophic level. It is sometimes interesting to back transform the delta in the original scale of contaminants. This is possible with the plot_delta_tracers. For example, for our contaminants, we wanted to backtransform them to have an estimate of the TMF index (Trophic Magnification Factor). So we needed to plot 10^delta instead of delta

back_tranfo <- function(x) 10^x
print(plot_delta_tracers(myresults,mydata,"FOSA",list(back_tranfo)))

references



Irstea/escroc documentation built on June 18, 2022, 11:22 p.m.