# knitr::opts_chunk$set(eval = FALSE)
This package estimates Trophic Discrimination Factors (TDFs) based on the imputation function within the MCMCglmm
package (http://cran.r-project.org/web/packages/MCMCglmm/index.html) and includes
functionality includes the error associated with building phylogenetic trees using the mulTree
package (https://github.com/TGuillerme/mulTree).
\section{Installation}
To install SIDER
you will required the devtools
package. If you dont have devtools
installed, you will need to run the following:
# Installing devtools install.packages("devtools")
Following this you can install SIDER
directly from GitHub using the following:
# Installing SIDER devtools::install_github("healyke/SIDER@v1.0.0.0")
However, if you are reading this vignette as part of the help files you should only need to load the package
library(SIDER)
SIDER
has a data file which contains discrimination factors for a range of species.
This data includes what tissue isotopic value were measured from and the basic ecology of the species including its diet (herbivore, etc) and whether it is a marine or terrestrial organism.
We search this data using the scrumpSider
function.
##search the dataset for a single species scrumpSider(iso.data = "Homo_sapiens")
We can also load the full dataset as an object
##save the entire dataset as an object isotope_data <- scrumpSider(iso.data = "all") # View the first rows of the data frame isotope_data[1,]
SIDER
also contains phylogenies for Aves and Mammalia
As we may want to include the error associated with building phylogenies into our analysis we take a sample of possible phylogenetic trees (See Healy et al. 2014 (http://rspb.royalsocietypublishing.org/content/281/1784/20140298).
SIDER
contains a subset of 10 trees built using the distribution of bird phylogeny from the Jetz et al. 2014 (http://birdtree.org/) and the Fritz et al. 2009 mammal phylogeny polytomy generated by Kuhn et al. 2011 (http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/full) which we combined at a rooted age of 250 Mya (million years ago). We can load and look at these phylogenies using the scrumpSider
function.
##single phylogeny single_tree <- scrumpSider(tree = 1) summary(single_tree) ##all trees combined_trees <- scrumpSider(tree = "all")
\section{Testing the new data: \texttt{recipeSider}}
In order to estimate a trophic enrichment factor for a new species we need to check that the species is already present in our phylogeny and check what data is available for the new species.
recipeSider
checks for the presence of the following data:
tissue
(blood
, claws
, collagen
, feather
, hair
, kidney
, liver
, milk
, muscle
);
habitat
(terrestrial
, marine
);
and diet.type
(carnivore
, herbivore
, omnivore
, pellet
).
# Checking the data for the species we want to estimate TEF for (Meles meles) new_data_test <- recipeSider(species = "Meles_meles", habitat = "terrestrial", taxonomic.class = "mammalia", tissue = "blood", diet.type = "omnivore", tree = combined_trees)
If the species is not present in the loaded phylogeny (e.g. Komodo dragon - Varanus komodoensis), or that some values are missing (e.g. tissue
), we get an error message to indicate what is missing from our data.
N.B, the following code will throw a stop error if evaluated.
# Some incomplete dataset new_data_test <- recipeSider(species = "Varanus_komodoensis", habitat = "terrestrial", taxonomic.class = "mammalia", tissue = "NA", diet.type = "omnivore", tree = combined_trees)
The recipeSider
function also formats the data for the new species data so that it can be combined with the data already available within the package using the prepareSider
function.
\section{Formatting the new data: prepareSider
}
We now need to format the data by combining both the isotopic data already available within the package and the data from the new species. We also include what isotope we want to estimate a trophic discrimination value for (either carbon
or nitrogen
).
tdf_data_c <- prepareSider(new_data_test, isotope_data, combined_trees, "carbon")
N.B.
isotope_data
is the isotopic dataset already implemented inSIDER
.
We now have a mulTree
class object, which is required for the imputation analysis.
It contains the matched phylogenies, in this case two phylogenies:
tdf_data_c$phy
and a dataset containing the TDF and related data with the new species, for which you want to estimate a trophic enrichment factor, at the top with a NA
for either delta13C
or delta15N
depending on isotope.
head(tdf_data_c$data)
\section{Running the analysis: prepareSider
}
With the data formatted as a mulTree
object we can decide on a model which will impute the new species estimate.
In this case we will run the full model to estimate delta13C with the fixed factors of diet type and habitat type.
formula.c <- delta13C ~ diet.type + habitat
and random terms that includes the animal
term which is required to include phylogeny into the analysis:
random.terms <- ( ~ animal + species + tissue)
As we rely on Bayesian imputation to estimate the missing value we also need to specify a prior, in this case we use a non-informative prior
prior <- list(R = list(V = 1, nu=0.002), G = list(G1=list(V = 1, nu=0.002), G2=list(V = 1, nu=0.002), G3=list(V = 1, nu=0.002)))
along with the number of iterations to run the chain (nitt
), the burn-in (burnin
), the sampling thinning (thin
), the number of chains to run (no.chains
) as recommended in the MCMCglmm
guidelines (https://cran.r-project.org/web/packages/MCMCglmm/vignettes/CourseNotes.pdf)}.
nitt <- c(10) burnin <- c(1) thin <- c(1) parameters <- c(nitt, thin, burnin) no.chains <- c(2)
N.B. These settings are only for testing and running a quick example, you can use the following parameters to produce results that pass convergence and avoid autocorrelation
nitt <- c(1200000) burnin <- c(200000) thin <- c(500) parameters <- c(nitt, thin, burnin) no.chains <- c(2)
We need to check that our MCMC chains are converging so we use the Gelman and Rubin diagnostic to check the convergence and also check that the estimated parameters have an effective sample size >1000.
convergence = c(1.1) ESS = c(1000)
As the function exports the model output, to avoid memory issues within R
when running over multiple phylogenies, make sure you have set the working directory to somewhere appropriate. Vignettes use a temporary directory, but if you run this code yourself then the files will appear in your working directory (which you may choose to change prior to running this code, and then change back afterwards).
TDF_est.c <- imputeSider(mulTree.data = tdf_data_c, formula = formula.c, random.terms = random.terms, prior = prior, output = "test_c_run", parameters = parameters, chains = no.chains, convergence = convergence, ESS = ESS) ###Now lets have a look at the files imputeSider has saved to the current working directory list.files(pattern = "test_c_run")
imputeSider
now runs the selected amount of chains (no.chains
) for each of the sampled phylogenies and exports the resulting MCMC chains to the working directory.
Hence in this case we have run two chains for each of the two sampled trees so there should be four files in your working directory ending with something like run-tree1_chain2.rda
.
These are the full MCMCglmm
model outputs for each of the chains run and can be imported back into the R
for full inspection if required using the read.mulTree
function. (Note this will require saving your files to a permanent working directory)
Notice also that the two other files ending with something similar to run-tree2_conv
.
These give a description of the convergence diagnostics of the chains for each tree.
The results of these diagnostics for each tree are also printed in R
after running imputeSider
returning whether the Effective sample size exceed ESS for all estimated parameters (with the number for each parameter given as a list), and whether all chains converged for each parameter.
All these models can be separately imported into R
using read.mulTree()
.
However, since we are only interested in the estimated TDF for our species imputeSider
only imports the imputed posterior distribution of the estimated TDF for our species.
The contents of the TDF_est.c$tdf_global
object are the posterior estimate of the imputed TDF aggregated across all chains in the model run (TDF_est.c$tdf_estimates
is a list containing each posterior chains separately).
As a mcmc
class object, the many functions of the coda
package provide options for summarising and plotting the distribution.
# Explore the names of the list created by SIDER::imputeSider names(TDF_est.c) # Calculate summary statistics of the posterior. # Specifically, the mean and standard deviation would be # taken from here and used in a mixing model analysis using # MixSIAR, MixSIR or SIAR for example. summary(TDF_est.c$tdf_global) # Credible intervals and the mode of the posterior are obtained # using the hdrcde package hdrcde::hdr(TDF_est.c$tdf_global, prob = c(50, 95, 99)) # You can also create density plots of the posterior coda::densplot(TDF_est.c$tdf_global)
file.remove(list.files(pattern = "test_c_run"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.