Set to 'defrost': How to use defrostR for standardizing amphibian taxonomy

knitr::opts_chunk$set(echo = TRUE,  warning = FALSE, message = FALSE, echo = TRUE, tidy = FALSE, size="small")
options(width=100)

Introduction

This package is designed to simplify the workflow of combining amphibian data sets from sources that use different taxonomic nomenclature, by conforming to the most current system suggested by the Amphibian Species of the World: an Online Resource, Forst (2017). Comparative studies require the combination of data from a range of sources such as spatial or conservation data from the IUCN red list, information on evolutionary history from published a phylogeny such as Pyron 2014, Syst Biol or life history data from AnAge. These sources have either made the decision to follow a specific taxonomic system or, more often, are no longer up-to-date. It can therefore be complicated to establish a universal or current nomenclature to be able to efficiently combine data sets. As such, the main objective of this package is to harvest taxonomic classifications of all amphibian species listed on the ASW website, harvest all listed synonyms per species and then using this information, suggest 'Frost' (ASW) names for any given list of taxa.

This package could also be useful for easily updating museum catalogs etc., and includes functions for producing summary statistics on species numbers at various taxonomic levels, or to update species names in manuscripts etc. after a taxonomic group has received nomenclature revisions

Below are some examples of how to use the package.

Loading the package

defrostR is a package written in the R programming language. The most recent version can be downloaded from 'gitHub', using the 'devtools' library. The first step is therefore to make sure that the 'devtools' package is installed and loaded. Successful installation of 'defrostR' might also require that the 'XML' package is installed prior.

library(devtools)
install_github("hcliedtke/defrostR", build_vignettes = TRUE)
library(defrostR)
library(knitr)

Compile ASW taxonomy

The starting point of this package is to populate a list of all currently registered species on the ASW website. For ease-of-use, a version of this is stored in the 'defrostR' package internally, but this may of course be outdated and so to generate your own, up-to-date version run the getTaxonomy() function.The function takes no arguments and essentially iterates through the ASW website, building the URL for each species through a bottom-up approach (i.e. first looking for all orders, then superfamilies, then families, subfamilies, genera and finally species). The resulting table contains 7 columns: one for each taxonomic level (Order through to Species) and a final column with the URL for the ASW website for each species.

Run the getTaxonomy() function (this may take 10 to 15 minutes), or have a look at the internally stored 'asw_taxonomy' data file:

Build taxonomy

#asw_taxonomy<-getTaxonomy()
head(asw_taxonomy)
kable(head(asw_taxonomy), padding=2)

Get some species stats

The first think one might be interested in is to get some basic numbers for listed taxonomic units nested within a particular level.

the basics:

aswStats(asw_taxonomy)
kable(aswStats(asw_taxonomy))

Additional arguments can be specified in the aswStats() function to narrow down the search. For example to get information on only a specific Family:

information on a specific group of interest:

aswStats(asw_taxonomy, Family="Plethodontidae")
kable(aswStats(asw_taxonomy, Family="Plethodontidae"))

This can be applied to any taxonomic level and to make sure everything is working properly (or if the asw_taxonomy data-set is up-to-date), you could double check if this matches the information on the ASW website

taking a closer look:

We can also turn on 'verbose' to get a list of all nested units instead of just the summary a table. This is useful for getting a more detailed overview, but also for making graphical representations of this information:

aswStats(asw_taxonomy, Family = "Plethodontidae",verbose = T)
display this information visualy:
plethodontid.stats<-aswStats(asw_taxonomy, Family = "Plethodontidae",verbose = T)
par(mar=c(1,1,1,1))
pie(plethodontid.stats$Genera[order(plethodontid.stats$Genera, decreasing = T)], cex=0.5, main = "No. of Species per Genus", col = rainbow(length(plethodontid.stats$Genera),s = 0.5), clockwise = T, border = NA)

Compile Synonym list

Hopefully the most useful data-set this package generates is an up-to-date list of all possible synonyms per species, to facilitate matching data sets that might be using different names for the same species. As with the taxonomic table, a version of the current synonym list is stored in the defrostR package internally. This may of course be outdated and so to generate your own, up-to-date version run the getSynonym() function! The function takes the getTaxonomy() output as a reference and then looks through every species webpage to tabulate all listed synonyms. It is therefore important that you specify a "asw_taxonomy" object that is more up-to-date than the one stored in the package (if none is specified, defrostR will use the internal version). Generating a full list for all amphibian species can take a long, long time (up to an hour). More realistically, you may only be interested in a specific taxonomic group and so you can narrow down your search as in this example here:

Harvest synonyms

agalychnis_synonyms<-getSynonyms(Genus="Agalychnis")
head(agalychnis_synonyms,n = 15)
kable(head(agalychnis_synonyms,n = 15))

The output is a table with two columns listing species names and all available synonyms (here showing the first 15 entries). We could summarize this information like so:

how many synonyms per species are there?

summary(agalychnis_synonyms$species)
kable(summary(agalychnis_synonyms$species))

Lets defrost something!!!

Having now prepared the data-set of amphibian synonyms, the main objective of this package is to facilitate the unification of naming systems, i.e. matching a list of names to the synonym table and suggesting the currently advised nomenclature by Frost (ASW). As an example, lets look at how the names on Amphibia Web differ to those on ASW. Their most current taxonomy can be found here, and can be read into R directly like so:

amphweb_latest<-read.csv("https://amphibiaweb.org/amphib_names.txt", sep="\t")

For this tutorial however, we will use a version stored internally in defrostR as "amphweb" (again,might be out of date so be careful).

load the amphibia web taxonomy:

head(amphweb)
kable(head(amphweb))

Query a list of taxa against the synonym table

For this example, lets look only at the genus Hyla that due to a recent revision by Duellman et al. 2016, Zootaxa has experiences name changes that have been implemented in ASW, but not AW:

amphweb.hyla<-amphweb[amphweb$genus=="Hyla",]

and lets "defrost" that list, using the synonym list created with getSynonyms(), here, using the version stored internally:

hyla.defrosted<-defrost(query=paste(amphweb.hyla$genus, amphweb.hyla$species),asw = asw_synonyms)
head(hyla.defrosted, n=15)
kable(head(hyla.defrosted, n=15))

The output here is a table listing the user-specified query, a 'stripped' version of that input that tries to deal with any differences such as capitalization or underscores etc. that might cause issues, a status column listing whether any names match and if not, whether they could be "defrosted" with no problem, or whether names were not found as listed synonyms or whether names are ambiguous and could refer to more than one species. A specific warning column is also included that, for now, only warns you if the resulting name changes result in duplicates. The final column lists the suggested Frost/ASW names.

Various additional arguments can be invoked, for example, whether ambiguities should be sorted out on the fly with the user deciding on which species is referred to on the go, or for example whether to pass forward the original query names if no match is found in the synonym list.

Are my names up-to-date?

instead of looking through the output table of defrost(), we can also print out a report using the synonymReport() function.

the basics:

synonymReport(hyla.defrosted)
kable(synonymReport(hyla.defrosted))

this tells us that less than half of the names are up to date, and defrostR could update quite a few on its own without any complications. defrostR found all names on ASW and there were no issues with ambiguities (queries which mach synonyms belonging to more than one species), but the automatic defrosting has resulted in 4 cases where names are not unique anymore! These need to be checked carefully and most likely are units that used to be distinct and are now synonyms. We can get more of an idea of what has happened here by asking defrostR to spit out a bit more information:

the details:

synonymReport(hyla.defrosted, verbose=T)

At just a glance, we can already see that H. heinzsteinitzi has been sunk into D. japonicus and H. suweonensis into D. immaculatus and so, these two species are no longer valid according to ASW.

Of course all of this can also be done manually from the defrost() output:

check which names are ambiguous (in this case 0):

hyla.defrosted[hyla.defrosted$status=="ambiguous",]

check which names are now duplicated:

hyla.defrosted[hyla.defrosted$warnings %in% "duplicated",]
kable(hyla.defrosted[hyla.defrosted$warnings %in% "duplicated",])

The decision can now be made how to deal with these duplicates, or should there be any ambiguities, after which point, the ASW names can simply be adopted for use in the original data-set and voila!

amphweb.hyla$ASW_names<-hyla.defrosted$ASW_names


Try the defrostR package in your browser

Any scripts or data that you put into this service are public.

defrostR documentation built on Jan. 20, 2018, 9:01 a.m.