Evolutionary distinctiveness

IMLS update, 2021-06-23, Andrew Hipp

As we have been waiting for our final sequencing results, which just came in at the end of May [Hooray!], I have been thinking about how to organize and use the phylogenetic data. I'll present today edivColl -- for Evolutionary Diversity in Collections -- which is the start of an R package that will provide tools for us and for others doing similar work.

The package lives at https://github.com/andrew-hipp/edivColl, but it can be migrated to the shared repository if that's better, or we can fork it into there if we like... whatever is needed. At this point, I'm working in a separate repository b/c that's what I'm used to.

Also, please note that most of the code I am showing here is embedded in the examples for the functions in this package. My plan is that this vignette will eventually serve as a worked example and a gateway into the package and questions that can be addressed with it, and each example will provide enough information to get our IMLS data into our hands and others' for analysis.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
temp <- require('edivColl')
if(!temp) {
  devtools::install_github('andrew-hipp/edivColl', dep = FALSE)
  # note that if you don't have dependencies already installed, set dep = TRUE or
  #   leave out the dep argument. it's in there now b/c updating dependencies is enforced
  #   and sometimes irritating during development.
  library(edivColl)
}

1. First, let's get our data together

1.1 phylogenies

Our phylogenies as they stand now are included as package data. My thought is that we will leave them as such unless there is some concern about doing so; I'd rather have the data all easily accessible to reviewers and users in the package instead of making them scrounge around in other repositories (which I and almost everyone else does; it's not the end of the world, I just don't think it's ideal if you can avoid it).

The phylogenies we are developing have lots of accession information appended, and multiple inviduals per species. So for example:

library(ape) # this is a standard R phylogenetics package
data(malus_tr) # this is one of Lindsey's Malus trees
head(malus$tip.label, 10) # these are the tip labels as we build them
plot(malus, cex = 0.5)

For understanding phylogeny of the group, the accessions information and inclusion of multiple exemplars per species is essential. But for phylogenetic diversity and distinctiveness, this is most distracting. So let's strip off that information. The cleanPhylo function assumes that you have thrown the taxon name somewhere into the tip label (if you haven't done this, there are helper functions on my lab repository, but I am keeping them separate for the time being b/c this is very lab-specific) and delimited. It strips out trailing and leading whitespace by default and a few other niceties, though others are still needed. See ?cleanPhylo for details.

malus2 <- cleanPhylo(malus, delim = '_|_')
head(malus2$tip.label, 20)

Isn't that nice?

Also, for phylogenetic diversity, we generally want the branch lengths to be in time units; this tree has its branches in units of substitutions / nucleotide. Let's make a rather casual assumption about the rate of evolution evolving along a tree and rescale the tree appropriately.

malus2 <- chronos(malus2)
plot(malus2, cex = 0.6)

1.2 collections data

Okay, that's great. Now our questions are obviously tied not to just the phylogeny, but to the phylogeny and what is at the tips, both what we want to obtain and what we have already. So let's bundle up the phylogeny with data about our garden. There's a function make.consInt that creates a consistent object type, bundling together data about what is in the garden and what is desired with a phylogeny, and includes both the full data and tree and then just a tree subsetted to your taxa, as well as a data table that matches that tree. We'll talk more in a bit about why bother keeping the full tree. What it doesn't do yet is timestamp the data import and the file locations, which would be worth doing! I am relegating this information for the time being to the (partial) documentation of the datasets.

require(magrittr) # parce que ceci n'est pas une pipe
data(accessions.mor) # read MOR accessions data
data(desiderata.mor) # read MOR desiderata list
temp <- c(accessions.mor$Taxon, desiderata.mor$taxon_name) %>%
   unique %>% sort
dat.mor <- data.frame(inGarden = temp %in% accessions.mor$Taxon,
                      wanted = temp %in% desiderata.mor$taxon_name,
                      row.names = temp)
rm(temp)
combo.malus <- make.consInt(malus2, dat.mor)
data(quercus_tr) # while we're at it, I'm throwing in Quercus
quercus2 <- cleanPhylo(quercus)
combo.quercus <- make.consInt(quercus2, dat.mor)
str(combo.quercus)

2. Now we can start thinking about phylogenetic weight of species

2.1 What weights might we use?

There are a few ways we could quantify the phylogenetic weight of a species for conservation purposes.

2.2 a little look at eDisGlobal and eDisColl

There are a few functions in the package that calculate these. w.phy calculates evolutionary distinctiveness using a generalized least squares estimator, presented and illustrated in Hipp et al. 2018. I like this b/c it has its roots in an explicit model of trait evolution, and we tend to think of phylogenies as being useful in ecology b/c they track trait histories (though in conservation, we might think of the genomic ghosts of extinct ancestors as having moral standing, irrespective of their trait 'values,' just as we often think of species as having moral standing irrespective of their traits or utility to humans). But it's not widely used. I think only I use it. The more widely used flavor of evolutionary distinctiveness (e.g. as used in the EDGE method) uses summed branch lengths divided by number of descendents.

So eDisCalc calculates these on a consInt object, and returns both the global and collections results.

library(ggplot2)
library(ggrepel)
qcalc <- eDisCalc(combo.quercus)
qcalc$intersect$sp <- row.names(qcalc$intersect)
p <- ggplot(qcalc$intersect,
            aes(y = all.ed.equalSplits,
                x = sub.ed.equalSplits,
                label = sp))
p <- p + geom_point()
p <- p + geom_label_repel()
print(p)


andrew-hipp/edivColl documentation built on Jan. 29, 2023, 11:40 a.m.