Introduction - How to use EcoDiet"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

EcoDiet is a new tool for assimilating data in food-web studies. The goal of the package is to quantify trophic interactions between food-web components by combining stomach content analyses and biotracers in a Bayesian hierarchical model. To do so, the model simultaneously estimates a probabilistic topology matrix of the trophic network and the diet matrix. The topology matrix is a probabilistic description of "who eats whom" in the food web nad is composed of the probabilities that trophic links exist between food-web components, i.e. the probabilities $\eta$ that any prey is eaten by a given predator. The diet matrix is estimated conditionally to this topology and contains all diet proportions $\Pi$, hence it expresses the contribution (in %) of any prey to a given predator's diet.

The full EcoDiet model and its application to real and simulated datasets are described in Hernvann et al. 2022, Ecol Appl. Use citation("EcoDiet") to get the full reference.

Several options are available within the present package:

The example showcased here is an artificial dataset, created to be simple to visualize and understand. In this example as in Hernvann et al. 2022, Ecol Appl the biotracers used are stable isotopes but EcoDiet could be used to treat other analyses (e.g., fatty acids, specific-compounds stable isotopes).

Before running the following code, please ensure that EcoDiet and the required tools are correctly installed and set up on your computer. Instructions can be found in the README's instructions. If you're all set, let's load the EcoDiet package:

library(EcoDiet)

1. Load and check your data

To apply EcoDiet to your own data, your stomach content, biotracer and literature data should be in a specific format, similar to those of following example. This data can for instance be imported into R using .csv files:

example_stomach_data <- read.csv("./data/my_stomach_data.csv")
example_biotracer_data <- read.csv("./data/my_biotracer_data.csv")

Here we will demonstrate how EcoDiet works on a simulated dataset stored within the EcoDiet package.

Stomach content data

The stomach content table gathers the sum of occurrences of each prey trophic group in the analyzed stomachs of each consumer trophic group. The first column of the table contains the names of the prey trophic groups and the headers of the following columns contain the names of all the predator trophic groups. The last row of the table should be named "full", and indicates how many (non-empty) stomachs have been analyzed for each trophic group.

example_stomach_data <- read.csv(system.file("extdata", "example_stomach_data.csv",
                                    package = "EcoDiet"))
knitr::kable(example_stomach_data)

In this example, for the "huge" animals, 19 stomachs were analyzed and contained remainings. Among these stomachs, 17 contained "large" animal remainings and 12 contained "medium" animal remainings.

If some trophic groups of the studied food web don't have associated stomach content analyses, you should add a line in this table indicating the name of this group and 0 in the other columns (it is the case here for "small" animals that are at the base of the trophic network).

Biotracer data

The Biotracer table contains the analyses of different tracer concentrations. Each line of the table represents one individual on which were conducted biotracer analyses for various elements (here, stable isotope analyses for carbon and nitrogen). The first column of the table should be called "group" and contain name of the trophic group the individual belongs. The rest of the table contains the measures, one column corresponding to one biotracer.

example_biotracer_data <- read.csv(system.file("extdata", "example_biotracer_data.csv",
                                    package = "EcoDiet"))
knitr::kable(example_biotracer_data)

All the trophic groups of the studied food web should be present in this biotracer table. Thus, if biotracer analyses are missing for one given trophic group you still have to enter one line indicating the name of the trophic group and NA in the following columnds (as it is the case for the "huge" group).

The biotracers like stable isotope analyses are integrated in the model based on the trophic enrichment concept. Thus, for each biotracer informed, you also need to define the corresponding trophic discrimination factors corresponding to your the biotracers. In this example, we use common trophic discrimination factors for carbon and nitrogen:

trophic_discrimination_factor = c(0.8, 3.4)

2. Preprocess your data without literature data (Default option)

If you have data extracted from the literature, skip this and go to section 3.

Read this section if you don't want to integrate information from the literature / expert knowledge. This is specified using the literature_configuration argument.

literature_configuration <- FALSE

Preprocess the data

The preprocess_data function checks and rearranges the input data into a specific format that is read when running the EcoDiet model:

data <- preprocess_data(stomach_data = example_stomach_data,
                        biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration
                        )

Error messages will appear if the input data hasn't been provided in the expected shape. Please read carefully the error message and rearrange your data in the correct format.

Stomach content dataset dominated by numerous small values of occurrence might be problematical as to few groups could be identified as potential prey. To avoid such situations, you can choose to upscale the stomach content data so that occurrences are artificially increased but their relative importance are conserved. This is performed by specifying rescale_stomach = TRUE when preprocessing the data.

data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = example_stomach_data,
                        rescale_stomach = TRUE)

Check the trophic links

By default, the links that will be investigated by the model will be those derived from at least one observation in the stomach content analyses. This is a conservative assumption to avoid investigating all the possible trophic links in the food web, since it would make the model too complex and require too long runs, while also reduce the discrimination power of the model (see Hernvann et al. 2022). The trophic links that will be investigated can be summarized by a fixed topology computed by the preprocess_data based on stomach contents (i.e., data$o):

topology <- 1 * (data$o > 0)
print(topology)

If you want to add other important link that may not be observed in scat contents (or, on the contrary, remove false positive trophic links identified in the stomach contents), you can modify directly this binary topology matrix. This operation can be useful, for instance, if your stomach sampling size is too low and you missed a prey you are definitely sure that the predator feeds on, or if you know that the identification of a prey will be reduced due to very small size or high digestibility. To specify that the "huge" animal can also eat "small" animals, you can do:

topology["small", "huge"] <- 1
print(topology)

To run EcoDiet on this updated binary topology, you will have to append the previous one by re-running the preprocess_data as follows:

data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        topology = topology,
                        stomach_data = example_stomach_data)

3. Preprocess your data with literature data (Literature option)

If you don't have data extracted from the literature, skip this and go to section 4.

Read this section if you do want to integrate information from the literature / expert knowledge. This is specified using the literature_configuration argument.

literature_configuration <- TRUE

Define the priors

A literature diet table is used to set priors on the trophic link probabilities $\eta$ and the diet proportions $\Pi$. This table is similar to the stomach contents table, as all trophic groups must be included in the columns and rows. The numbers are the average diet proportions found in the literature. Here, the selected studies have identified that "huge" animals eat equally "large" and "medium" animals (thus the 0.5 and 0.5 numbers in the first column). The proportions for a given predator (i.e., within a given column) must sum to 1. The "small" animals are at the base of the ecosystem, so the column is filled with zeros.

The last row of the table corresponds to the literature pedigree score. This score (a number from 0 to 1) quantifies the literature reliability on each predator's diet. Here the dietary proportions from the literature are used to produce reliable estimates for the "huge" animals, e.g., the pedigree score associated is high (0.9). On the contrary, the diet proportions for the "medium" animals come from an older article focusing on a very different ecosystem so estimates produced are less reliable, e.g, the pedigree score is low (0.2). The pedigree score for the "small" animals is set at 1, because this group eats nothing. For more details please read the reference article.

example_literature_diets_path <- system.file("extdata", "example_literature_diets.csv",
                                    package = "EcoDiet")
example_literature_diets <- read.csv(example_literature_diets_path)
knitr::kable(example_literature_diets)

This summary of the literature data will be used to formulate:

  1. The priors on the topology matrix's $\eta$s. If a given literature diet proportion is zero, the corresponding prior Beta distribution of $\eta$ will be shifted toward 0. If the proportion is positive, the distribution will be shifted toward 1.

  2. The priors on the diet matrix's $\Pi$s. The literature diet proportions are entered as the hyperparameters of the prior Dirichlet distribution of $\Pi$.

The Pedigree scores are used to determine the priors' precision. Other parameters can be used to adjust the prior distributions:

nb_literature = 10
literature_slope = 0.5

Preprocess the data

The preprocess_data function then checks and rearranges the data in a specific format so that the EcoDiet model can be run:

data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = example_stomach_data,
                        literature_diets = example_literature_diets,
                        nb_literature = 10,
                        literature_slope = 0.5)

If any error appears, it means your data is not in the correct format. Please read the error message and try to rearrange the data in the correct format.

If you have a lot of small values in the stomach occurences, you can choose to upscale the stomach content data:

data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = example_stomach_data,
                        rescale_stomach = TRUE,
                        literature_diets = example_literature_diets,
                        nb_literature = 10,
                        literature_slope = 0.5)

Check the trophic links to investigate

These are the links that will be investigated by the model. It is not wise to assume that all the trophic links are possible. You therefore need to keep only the reasonnable trophic links (e.g., a shrimp cannot eat a whale). The matrix displayed by the preprocess_data function is based by default on the stomach content data (data$o) and on the literature diet matrix (data$alpha_lit):

topology <- 1 * ((data$o > 0) | (data$alpha_lit > 0))
print(topology)

If you want to add another trophic link, you can modify directly the binary topology matrix. It can be useful if you are sure that a prey is consumed by a given predator. However the trophic link is not observed in the stomach content data, and the study extracted from the literature did not identify the prey. To specify that the "huge" animal can also eat "small" animals, you can do:

topology["small", "huge"] <- 1
print(topology)

The new topology matrix can now be entered as an argument of the preprocess_data function:

data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        topology = topology,
                        stomach_data = example_stomach_data,
                        literature_diets = example_literature_diets,
                        nb_literature = 10,
                        literature_slope = 0.5)

4. Plot the data and the priors

You can visualize your data with the plot_data function:

plot_data(biotracer_data = example_biotracer_data,
          stomach_data = example_stomach_data)

You can save the figures as PNG in the current folder using:

plot_data(biotracer_data = example_biotracer_data,
          stomach_data = example_stomach_data,
          save = TRUE, save_path = ".")

Whether the priors are non-informative or informed by the literature, you can plot the mean of the prior distributions for the trophic link probabilities $\eta$ and the diet proportions $\Pi$:

plot_prior(data, literature_configuration)

You can also see the prior distributions for one trophic group (or predator):

plot_prior(data, literature_configuration, pred = "huge")

This way, you can change the prior parameters and see how it affects the prior distributions. Here, we will change the nb_literature parameter from 10 to 2:

data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        topology = topology,
                        stomach_data = example_stomach_data,
                        literature_diets = example_literature_diets,
                        nb_literature = 2,
                        literature_slope = 0.5)

plot_prior(data, literature_configuration, pred = "huge", variable = "eta")

5. Run the model

The write_model function writes the model in the BUGS syntax. Here You need to specify whether you want to use or not priors form the literature since the model structure will be affected by this choice. The model is written as a .txt for which you should specify a path and name through the file.name argument.

To see the written model, you can turn on visualization using print.model=TRUE or opening the saved .txt file

filename <- "mymodel.txt"
write_model(file.name = filename, literature_configuration = literature_configuration, print.model = F)

First, run the model as a test (i.e., low adaption phase and low number of iterations) to check that the model is compiling properly. Specifying run_param="test" will by default run the model with nb_iter=1000 and nb_burnin=500.

mcmc_output <- run_model(filename, data, run_param = "test")

The low numbers won't be enough to achieve a satisfactory model convergence. You should progressively increase the number of adaptation steps nb_adapt and of iterations nb_iter while checking the "Convergence warnings" message. You can specify these model run parameters by specifying run_param=list(nb_iter=iter, nb_burnin=burnin, nb_thin=thin). There are also a couple of default run parameters ("very short","short","normal","long","very long") that you can use in the with run_param and learn about by calling ?run_param

Be aware that, depending on your data and especially the number of trophic groups and investigate trophic interactions, the model can take hours or days to run.

mcmc_output <- run_model(filename, data, run_param=list(nb_iter=10000, nb_burnin=5000, nb_thin=5))
mcmc_output <- run_model(filename, data, run_param=list(nb_iter=50000, nb_burnin=25000, nb_thin=25))
mcmc_output <- run_model(filename, data, run_param=list(nb_iter=100000, nb_burnin=50000, nb_thin=50))

mcmc_output_example <- run_model(filename, data, run_param=list(nb_iter=50000, nb_burnin=25000, nb_thin=25))

Note that for much complex case studies, the dimension of the model may be considerable hence the time required by the runs may be dramatically long. In that context, while analyzing your results carefully, you might be willing to use your model outputs if a only a very limited number of variables don't reach convergence after especially long runs.

Once a model reaches convergence objectives, don't forget to save it on your machine.

save(mcmc_output_example, file = "./data/mcmc_output_example.rda")

Diagnoses

Here we provide the diagnose_model function to perform simple diagnoses on the EcoDiet run. It displays the number of variables for which the Gelman-Rubin test remains higher than different thresholds, highlighting also the 10 worst variables in terms of convergence. The object created contains these values for all the model variables.

Gelman_model <- diagnose_model(mcmc_output_example)
print(Gelman_model)

The function can also be used to produce diagnostic plots and save them as a .pdf file. Complex models will have so many variables that the process can be very long so the user can also specify the variable for which graphs should be produced and stored.

diagnose_model(mcmc_output_example, var.to.diag = "all", save = TRUE)

6. Plot and save the results

The object output by the run_model function contains all MCMC samples, summary statistics and the configuration of the run:

str(mcmc_output_example)

Please read the documentation of the jagsUI package for more information.

Below we provide some functions to for a simple exploration of the results.

The mean results

The model's outputs are the approximated a posteriori distributions for the trophic links probabilities $\eta$ and the diet proportions $\Pi$. You can visualize the mean of these distribitions with the plot_results function:

plot_results(mcmc_output_example, data)

You can access the main statistics by variable directly from the jagsUI object returned by the rrun_model function, including the mean value of the variables:

print(mcmc_output_example$summary[,"mean"])

The probability distributions

The probability distributions can be plotted for one predator:

plot_results(mcmc_output_example, data, pred = "huge")
plot_results(mcmc_output_example, data, pred = "large")

7. Save another variable than $\Pi$ and $\eta$

You actually have the possibility to monitor and output the statistics of all the model parameters. For example you may be interested by the variable $\delta$ that represents the trophic discrimination factor. In the EcoDiet model, a different trophic discrimination factor is used for each trophic group and for each element, allowing some differences between species. We can chose to monitor these parameters when running the model by using the variables_to_save argument:

mcmc_output_delta <- run_model(filename, data, 
          variables_to_save = c("delta"),
          run_param = "test")

And now you can access the mean value using:

print(mcmc_output_delta$summary[,"mean"])


Try the EcoDiet package in your browser

Any scripts or data that you put into this service are public.

EcoDiet documentation built on Jan. 7, 2023, 1:18 a.m.