# knitr::opts_chunk$set(eval = FALSE)

This package estimates Trophic Discrimination Factors (TDFs) based on the imputation function within the MCMCglmm package (http://cran.r-project.org/web/packages/MCMCglmm/index.html) and includes functionality includes the error associated with building phylogenetic trees using the mulTree package (https://github.com/TGuillerme/mulTree).

\section{Installation} To install SIDER you will required the devtools package. If you dont have devtools installed, you will need to run the following:

# Installing devtools
install.packages("devtools")

Following this you can install SIDER directly from GitHub using the following:

# Installing SIDER
devtools::install_github("healyke/SIDER@v1.0.0.0")

However, if you are reading this vignette as part of the help files you should only need to load the package

library(SIDER)

Read in the data

SIDER has a data file which contains discrimination factors for a range of species. This data includes what tissue isotopic value were measured from and the basic ecology of the species including its diet (herbivore, etc) and whether it is a marine or terrestrial organism.

We search this data using the scrumpSider function.

##search the dataset for a single species
scrumpSider(iso.data = "Homo_sapiens")

We can also load the full dataset as an object

##save the entire dataset as an object
isotope_data <- scrumpSider(iso.data = "all")

# View the first rows of the data frame
isotope_data[1,]

SIDER also contains phylogenies for Aves and Mammalia As we may want to include the error associated with building phylogenies into our analysis we take a sample of possible phylogenetic trees (See Healy et al. 2014 (http://rspb.royalsocietypublishing.org/content/281/1784/20140298). SIDER contains a subset of 10 trees built using the distribution of bird phylogeny from the Jetz et al. 2014 (http://birdtree.org/) and the Fritz et al. 2009 mammal phylogeny polytomy generated by Kuhn et al. 2011 (http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/full) which we combined at a rooted age of 250 Mya (million years ago). We can load and look at these phylogenies using the scrumpSider function.

##single phylogeny
single_tree <- scrumpSider(tree = 1)
summary(single_tree)

##all trees
combined_trees <- scrumpSider(tree = "all")

\section{Testing the new data: \texttt{recipeSider}} In order to estimate a trophic enrichment factor for a new species we need to check that the species is already present in our phylogeny and check what data is available for the new species. recipeSider checks for the presence of the following data: tissue (blood, claws, collagen, feather, hair, kidney, liver, milk, muscle); habitat (terrestrial, marine); and diet.type (carnivore, herbivore, omnivore, pellet).

# Checking the data for the species we want to estimate TEF for (Meles meles)
new_data_test <- recipeSider(species = "Meles_meles", 
                             habitat = "terrestrial", 
                             taxonomic.class = "mammalia", 
                             tissue = "blood", 
                             diet.type = "omnivore", 
                             tree = combined_trees)

If the species is not present in the loaded phylogeny (e.g. Komodo dragon - Varanus komodoensis), or that some values are missing (e.g. tissue), we get an error message to indicate what is missing from our data.

N.B, the following code will throw a stop error if evaluated.

# Some incomplete dataset
new_data_test <- recipeSider(species = "Varanus_komodoensis", 
                             habitat = "terrestrial", 
                             taxonomic.class = "mammalia", 
                             tissue = "NA", 
                             diet.type = "omnivore", 
                             tree = combined_trees)

The recipeSider function also formats the data for the new species data so that it can be combined with the data already available within the package using the prepareSider function.

\section{Formatting the new data: prepareSider} We now need to format the data by combining both the isotopic data already available within the package and the data from the new species. We also include what isotope we want to estimate a trophic discrimination value for (either carbon or nitrogen).

tdf_data_c <- prepareSider(new_data_test, 
                          isotope_data, 
                          combined_trees, 
                          "carbon")

N.B. isotope_data is the isotopic dataset already implemented in SIDER.

We now have a mulTree class object, which is required for the imputation analysis. It contains the matched phylogenies, in this case two phylogenies:

tdf_data_c$phy

and a dataset containing the TDF and related data with the new species, for which you want to estimate a trophic enrichment factor, at the top with a NA for either delta13C or delta15N depending on isotope.

head(tdf_data_c$data)

\section{Running the analysis: prepareSider}

With the data formatted as a mulTree object we can decide on a model which will impute the new species estimate. In this case we will run the full model to estimate delta13C with the fixed factors of diet type and habitat type.

formula.c <- delta13C ~ diet.type + habitat

and random terms that includes the animal term which is required to include phylogeny into the analysis:

random.terms <- ( ~ animal + species + tissue)

As we rely on Bayesian imputation to estimate the missing value we also need to specify a prior, in this case we use a non-informative prior

prior <- list(R = list(V = 1, nu=0.002), 
              G = list(G1=list(V = 1, nu=0.002),
                       G2=list(V = 1, nu=0.002), 
                       G3=list(V = 1, nu=0.002)))

along with the number of iterations to run the chain (nitt), the burn-in (burnin), the sampling thinning (thin), the number of chains to run (no.chains) as recommended in the MCMCglmm guidelines (https://cran.r-project.org/web/packages/MCMCglmm/vignettes/CourseNotes.pdf)}.

nitt <- c(10)
burnin <- c(1)
thin <- c(1)
parameters <- c(nitt, thin, burnin)
no.chains <- c(2)

N.B. These settings are only for testing and running a quick example, you can use the following parameters to produce results that pass convergence and avoid autocorrelation

nitt <- c(1200000)
burnin <- c(200000)
thin <- c(500)
parameters <- c(nitt, thin, burnin)
no.chains <- c(2)

We need to check that our MCMC chains are converging so we use the Gelman and Rubin diagnostic to check the convergence and also check that the estimated parameters have an effective sample size >1000.

convergence =  c(1.1)
ESS = c(1000)

As the function exports the model output, to avoid memory issues within R when running over multiple phylogenies, make sure you have set the working directory to somewhere appropriate. Vignettes use a temporary directory, but if you run this code yourself then the files will appear in your working directory (which you may choose to change prior to running this code, and then change back afterwards).

TDF_est.c <- imputeSider(mulTree.data = tdf_data_c, 
                         formula = formula.c, 
                         random.terms = random.terms,
                         prior = prior, 
                         output = "test_c_run",
                         parameters = parameters,
                         chains = no.chains, 
                         convergence =  convergence, 
                         ESS = ESS)



###Now lets have a look at the files imputeSider has saved to the current working directory
list.files(pattern = "test_c_run")

imputeSider now runs the selected amount of chains (no.chains) for each of the sampled phylogenies and exports the resulting MCMC chains to the working directory. Hence in this case we have run two chains for each of the two sampled trees so there should be four files in your working directory ending with something like run-tree1_chain2.rda.

These are the full MCMCglmm model outputs for each of the chains run and can be imported back into the R for full inspection if required using the read.mulTree function. (Note this will require saving your files to a permanent working directory) Notice also that the two other files ending with something similar to run-tree2_conv. These give a description of the convergence diagnostics of the chains for each tree. The results of these diagnostics for each tree are also printed in R after running imputeSider returning whether the Effective sample size exceed ESS for all estimated parameters (with the number for each parameter given as a list), and whether all chains converged for each parameter.

All these models can be separately imported into R using read.mulTree(). However, since we are only interested in the estimated TDF for our species imputeSider only imports the imputed posterior distribution of the estimated TDF for our species.

The contents of the TDF_est.c$tdf_global object are the posterior estimate of the imputed TDF aggregated across all chains in the model run (TDF_est.c$tdf_estimates is a list containing each posterior chains separately). As a mcmc class object, the many functions of the coda package provide options for summarising and plotting the distribution.

# Explore the names of the list created by SIDER::imputeSider
names(TDF_est.c)

# Calculate summary statistics of the posterior. 
# Specifically, the mean and standard deviation would be
# taken from here and used in a mixing model analysis using 
# MixSIAR, MixSIR or SIAR for example.
summary(TDF_est.c$tdf_global)

# Credible intervals and the mode of the posterior are obtained 
# using the hdrcde package
hdrcde::hdr(TDF_est.c$tdf_global, prob = c(50, 95, 99))

# You can also create density plots of the posterior
coda::densplot(TDF_est.c$tdf_global)
file.remove(list.files(pattern = "test_c_run"))


healyke/DEsiR documentation built on April 19, 2020, 7:14 p.m.