In healyke/SIDER: Stable Isotope Discrimination Estimation using R

Load the required packages:

library(SIDER)
library(dplyr)
library(ggplot2)

# knitr::opts_chunk$set(eval = FALSE)

In this example we will illustrate how you can add additional data for known TDFs to the dataset and run SIDER. We also compare the results obtained from SIDER should the new species be added with the case when it was omitted entirely. In compiling the dataset for SIDER we applied strict rules on the data, including that the TDFs be from a strictly controlled feeding diet. In many cases this excluded data from studies where the TDFs were inferred indirectly from an experiemental manipulation such as when Derbridge et al (2015) inferred TDFs for wolves (Canis lupus) using a modified Stable Isotope Mixing Model in which the TDFs themselves were estimated. Here we will use SIDER to estimate the TDF for wolf hair when wolves are absent from the dataset, and when we include the Derbridge et al (2015) data.

It may seem like over-kill to run SIDER for a species for which you have data, and the obvious question is "why would you do this?". It comes down to how much you trust the TDFs for your species. If you are confident that they most fairly and acccurately reflect the consumers in your mixing model, then you should use them. If however they are from a different population, or at a different time, or they lack the number of replicates etc... then in all likelihood you are not as certain of them as you might be. By adding the data you do have to SIDER and then running SIDER with that same species and tissue missing, then you are getting an estimate that is more informed by the dataset total. Of course it will be weighted more heavily towards the same species data in the dataset owing to the phylogenetic correlation structure in the models, but you will also get estimates of the uncertainty that might be lacking if your data comprise only the mean estimates.

Throughout this example we run relatively short runs of SIDER which results in estimates that do not pass the convergence tests. This is done to keep the build time for the package vignettes to something reasonable. As per the introduction vignette, we recommend longer runs, which make take up to an hour or even more, to produce the definitive values you would used subsequently in any analysis.

SIDER data without wolves

##save the entire dataset as an object
SIDER_data <- scrumpSider(iso.data = "all")

##all trees
combined_trees <- scrumpSider(tree = "all")


# Checking the data for the species we want to estimate TEF for (Meles meles)
data_with_imputed_species <- recipeSider(species = "Canis_lupus", 
                                         habitat = "terrestrial", 
                                         taxonomic.class = "mammalia", 
                                         tissue = "hair", 
                                         diet.type = "carnivore", 
                                         tree = combined_trees)

# prepare the data for the carbon tdf model
tdf_data_c <- prepareSider(data_with_imputed_species, 
                          isotope_data, 
                          combined_trees, 
                          "carbon")

# prepare the data for the nitrogen tdf model
tdf_data_n <- prepareSider(data_with_imputed_species, 
                          isotope_data, 
                          combined_trees, 
                          "nitrogen")

# Define the formulae for both models
formula_c <- delta13C ~ diet.type + habitat
formula_n <- delta15N ~ diet.type + habitat

# Define the random structure which is common to both models
random_terms <- ( ~ animal + species + tissue)

prior <- list(R = list(V = 1, nu=0.002), 
              G = list(G1=list(V = 1, nu=0.002),
                       G2=list(V = 1, nu=0.002), 
                       G3=list(V = 1, nu=0.002)))

# model run settings
# *Might* want these to be longer runs and more chains.
nitt   <- c(120000)
burnin <- c(20000)
thin   <- c(50)
parameters <- c(nitt, thin, burnin)
n_chains <- c(2)

# convergence settings
convergence =  c(1.1)
ESS = c(1000)

# estimated TDFs for wolves that are not in (i.e. out) of 
# the dataset
TDF_est_out_c <- imputeSider(mulTree.data = tdf_data_c, 
                           formula = formula_c, 
                           random.terms = random_terms,
                           prior = prior, 
                           output = "wolves_out_sider_c_run",
                           parameters = parameters,
                           chains = n_chains, 
                           convergence =  convergence, 
                           ESS = ESS)

TDF_est_out_n <- imputeSider(mulTree.data = tdf_data_n, 
                           formula = formula_n, 
                           random.terms = random_terms,
                           prior = prior, 
                           output = "wolves_out_sider_n_run",
                           parameters = parameters,
                           chains = n_chains, 
                           convergence =  convergence, 
                           ESS = ESS)

Sider data with wolves

Much of the required data has already been read in and objects created in the code above, so all that remains is that we add the wolf data from Derbridge et al (2015) and re-run the model.

# add the wolf data by binding the new data onto the included SIDER dataset
derbridge_wolf_tdf_data <- rbind(SIDER_data, 
                                 data.frame(species = "Canis_lupus", 
                                                habitat = "terrestrial", 
                                                taxonomic.class = "mammalia",
                                                tissue = "hair", 
                                                diet.type =  "carnivore",
                                                source.iso.13C = NA,
                                                source.iso.15N = NA, 
                                                delta13C = 1.97,
                                                delta15N = 3.04,
                                                citation = c("Derbridge et al (2015)")))

# check it worked
tail(derbridge_wolf_tdf_data)


# Check the data for the species we want to estimate TEF for (Canis lupus)
derbridge_data_with_imputed_species <- recipeSider(species = "Canis_lupus", 
                                                   habitat = "terrestrial", 
                                                   taxonomic.class = "mammalia", 
                                                   tissue = "hair", 
                                                   diet.type = "carnivore", 
                                                   tree = combined_trees)

# prepare the carbon model
derbridge_tdf_data_c <- prepareSider(derbridge_data_with_imputed_species, 
                                      isotope_data, 
                                      combined_trees, 
                                      "carbon")

# prepare the nitrogen model
derbridge_tdf_data_n <- prepareSider(derbridge_data_with_imputed_species, 
                                      isotope_data, 
                                      combined_trees, 
                                      "nitrogen")


# estimated TDFs for wolves that are in the dataset
TDF_est_in_c <- imputeSider(mulTree.data = derbridge_tdf_data_c, 
                           formula = formula_c, 
                           random.terms = random_terms,
                           prior = prior, 
                           output = "wolves_in_sider_c_run",
                           parameters = parameters,
                           chains = n_chains, 
                           convergence =  convergence, 
                           ESS = ESS)

TDF_est_in_n <- imputeSider(mulTree.data = derbridge_tdf_data_n, 
                           formula = formula_n, 
                           random.terms = random_terms,
                           prior = prior, 
                           output = "wolves_in_sider_n_run",
                           parameters = parameters,
                           chains = n_chains, 
                           convergence =  convergence, 
                           ESS = ESS)


### Now lets have a look at the files imputeSider has saved to the 
### current working directory
# list.files(pattern = "test_c_run")

Compare the output from both models

In this example, there is very little difference in the two models. We can see this in the summary statistics and in the figure below we focus on the means and standard devations of the estimated TDFs since these are the statistics that would be passed onwards to a stable isotope mixing model (SIMM) analysis in packages such as MixSIAR, simmr or siar. See the accompanying vignette for an example pipeline on including these estimates in such an analysis.

# bundle the global estimates of TDFs from each of the 4 models into a 
# wide format data.frame.
tdf_wide <- data.frame(DC_in  = as.numeric(TDF_est_in_c$tdf_global),
                      DC_out = as.numeric(TDF_est_out_c$tdf_global), 
                      DN_in  = as.numeric(TDF_est_in_n$tdf_global),
                      DN_out = as.numeric(TDF_est_out_n$tdf_global)
                      )

# get some summary statistics of these estimates
summary(tdf_wide)

# tidy to long format for nice plotting
tdf_long <- tidyr::gather(tdf_wide, model, TDF)

# calculate means and standard deviations for plotting in ggplot
smrys <- tdf_long %>% group_by(model) %>% 
  summarise(mean = mean(TDF), sd = sd(TDF))

# a data.frame of the derbridge wolf TDF estimates to add to the figure
# for comparison
der_points = data.frame(x1 = 1, x2 = 2, x3 = 3, x4 = 4,
                        y1 = tail(derbridge_wolf_tdf_data$delta13C,1),
                        y2 = tail(derbridge_wolf_tdf_data$delta13C,1),
                        y3 = tail(derbridge_wolf_tdf_data$delta15N,1),
                        y4 = tail(derbridge_wolf_tdf_data$delta15N,1))

# make a nice ggplot of error bars showing the mean and standard deviations.
ggplot(smrys, aes(model, mean)) + 
  geom_errorbar(data = smrys, 
                mapping = aes(ymin = mean - sd, ymax = mean + sd), 
                size = 1, width = 0.3) + 
  geom_point(size = 3) + 
  scale_y_continuous(name="TDF") + 
  geom_segment(data = der_points, 
               mapping = aes(x = x1, y = y1, xend = x2, yend = y2),
               col = "red", lty = 2) + 
  geom_segment(data = der_points, 
               mapping = aes(x = x3, y = y3, xend = x4, yend = y4),
               col = "red", lty = 2) + 
  theme_classic()

Figure 1. The means and standard deviations for carbon and nitrogen TDFs for two models: one run with the Derbridge data included in the model, and one with it left out of the model. The red horizontal lines indicate the means of the actual Derbridge et al 2015 data for carbon and nitrogen.

healyke/SIDER documentation built on April 19, 2020, 7:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com