Creating a Taxon-Tree from Taxonomic Data Downloaded from the Paleobiology Database

Description

This function creates phylogeny-like object of type phylo from the taxonomic information recorded in a taxonomy download from the PBDB for a given group. Two different algorithms are provided, the default being based on parent-child taxon relationships, the other based on the nested Linnean hierarchy.

Usage

1
2
makePBDBtaxonTree(data, rank, method = "parentChild", solveMissing = NULL,
  tipSet = "nonParents", cleanTree = TRUE, APIversion = "1.1")

Arguments

data

A table of taxonomic data collected from the Paleobiology Database, using the taxa list option with show=phylo. Should work with versions 1.1-1.2 of the API, with either the 'pbdb' or 'com' vocab. However, as 'accepted_name' is not available in API v1.1, the resulting tree will have a taxon's *original* name and not any formally updated name.

rank

The selected taxon rank; must be one of 'species', 'genus', 'family', 'order', 'class' or 'phylum'.

method

Controls which algorithm is used for calculating the taxon-tree: either method = "parentChild" (the default option) which converts the listed binary parent-child taxon relationships from the input PBDB data, or method = "Linnean", which converts a taxon-tree by creating a table of the Linnean taxonomic assignments (family, order, etc), which are provided when option 'show=phylo' is used in PBDB API calls.

solveMissing

Under method = "parentChild", what should makePBDBtaxonTree do about multiple 'floating' parent taxa, listed without their own parent taxon information in the input dataset under data? Each of these is essentially a separate root taxon, for a different set of parent-child relationships, and thus poses a problem as far as returning a single phylogeny is concerned. If solveMissing = NULL (the default), nothing is done and the operation halts with an error, reporting the identity of these taxa. Two alternative solutions are offered: first, solveMissing = "mergeRoots" will combine these disparate potential roots and link them to an artificially-constructed pseudo-root, which at least allows for visualization of the taxonomic structure in a limited dataset. Secondly, solveMissing = "queryPBDB" queries the Paleobiology Database repeatedly via the API for information on parent taxa of the 'floating' parents, and continues within a while() loop until only one such unassigned parent taxon remains. This latter option may talk a long time or never finish, depending on the linearity and taxonomic structures encountered in the PBDB taxonomic data; i.e. if someone a taxon was ultimately its own indirect child in some grand loop by mistake, then under this option makePBDBtaxonTree might never finish. In cases where taxonomy is bad due to weird and erroneous taxonomic assignments reported by the PBDB, this routine may search all the way back to a very ancient and deep taxon, such as the Eukaryota taxon. Users should thus use solveMissing = "queryPBDB" only with caution.

tipSet

This argument only impacts analyses where the argument method = "parentChild" is also used. This tipSet controls which taxa are selected as tip taxa for the output tree. The default tipSet = "nonParents" selects all child taxa which are not listed as parents in parentChild. Alternatively, tipSet = "all" will add a tip to every internal node with the parent-taxon name encapsulated in parentheses.

cleanTree

By default, the tree is run through a series of post-processing, including having singles collapsed, nodes reordered and being written out as a Newick string and read back in, to ensure functionality with ape functions and ape-derived functions. If FALSE, none of this post-processing is done and users should beware, as such trees can lead to hard-crashes of R.

APIversion

Version of the Paleobiology Database API used by makePBDBtaxonTree when solveMissing = "queryPBDB". The current default is "1.1", which is the only option available as of 05/05/2015. In the future, the improved API version "1.2" will be released on the public PBDB server, which will become the new default for this function, but the option to return to "1.1" behavior will be retained for .

Details

This function should not be taken too seriously. Many groups in the Paleobiology Database have out-of-date or very incomplete taxonomic information. This function is meant to help visualize what information is present, and by use of time-scaling functions, allow us to visualize the intersection of temporal and phylogenetic, mainly to look for incongruence due to either incorrect taxonomic placements, erroneous occurrence data or both.

Note however that, contrary to common opinion among some paleontologists, taxon-trees may be just as useful for macroevolutionary studies as reconstructed phylogenies (Soul and Friedman, in press.).

Value

A phylogeny of class 'phylo', where each tip is a taxon of the given 'rank'. See additional details regarding branch lengths can be found in the sub-algorithms used to create the taxon-tree by this function: parentChild2taxonTree and taxonTable2taxonTree.

Depending on the method used, either the element $parentChild or $taxonTable is added to the list structure of the output phylogeny object, which was used as input for one of the two algorithms mentioned above.

Please note that when applied to output from the taxa option of the API version 1.1, the taxon names returned are the original taxon names as 'accepted_name' is not available in API v1.1, while under API v1.2, the returned taxon names should be the most up-to-date formal names for those taxa. Similar issues also effect the identification of parent taxa, as the accepted name of the parent ID number is only provided in version 1.2 of the API.

Author(s)

David W. Bapst

References

Soul, L. C., and M. Friedman. In Press. Taxonomy and Phylogeny Can Yield Comparable Results in Comparative Palaeontological Analyses. Systematic Biology (Link)

See Also

Two other functions in paleotree are used as sub-algorithms by makePBDBtaxonTree to create the taxon-tree within this function, and users should consult their manual pages for additional details:

parentChild2taxonTree and taxonTable2taxonTree

Other functions for manipulating PBDB data can be found at taxonSortPBDBocc, occData2timeList, and the example data at graptPBDB.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
## Not run: 

easyGetPBDBtaxa<-function(taxon,show=c("phylo","img","app")){
	#let's get some taxonomic data
	taxaData<-read.csv(paste0("http://paleobiodb.org/",
		"data1.1/taxa/list.txt?base_name=",taxon,
		"&rel=all_children&show=",
	paste0(show,collapse=","),"&status=senior"),
	stringsAsFactors=FALSE)
	return(taxaData)
	}

#graptolites
graptData<-easyGetPBDBtaxa("Graptolithina")
graptTree<-makePBDBtaxonTree(graptData,"genus",
	method="parentChild", solveMissing="queryPBDB")
#try Linnean
graptTree<-makePBDBtaxonTree(graptData,"genus",
	method="Linnean")
plot(graptTree,show.tip.label=FALSE,no.margin=TRUE,edge.width=0.35)
nodelabels(graptTree$node.label,adj=c(0,1/2))

#conodonts
conoData<-easyGetPBDBtaxa("Conodonta")
conoTree<-makePBDBtaxonTree(conoData,"genus",
	method="parentChild", solveMissing="queryPBDB")
plot(conoTree,show.tip.label=FALSE,no.margin=TRUE,edge.width=0.35)
nodelabels(conoTree$node.label,adj=c(0,1/2))

#asaphid trilobites
asaData<-easyGetPBDBtaxa("Asaphida")
asaTree<-makePBDBtaxonTree(asaData,"genus",
	method="parentChild", solveMissing="queryPBDB")
plot(asaTree,show.tip.label=FALSE,no.margin=TRUE,edge.width=0.35)
nodelabels(asaTree$node.label,adj=c(0,1/2))

#Ornithischia
ornithData<-easyGetPBDBtaxa("Ornithischia")
ornithTree<-makePBDBtaxonTree(ornithData,"genus",
	method="parentChild", solveMissing="queryPBDB")
#try Linnean
#need to drop repeated taxon first: Hylaeosaurus
ornithData<-ornithData[-(which(ornithData[,"taxon_name"]=="Hylaeosaurus")[1]),]
ornithTree<-makePBDBtaxonTree(ornithData,"genus",
	method="Linnean")
plot(ornithTree,show.tip.label=FALSE,no.margin=TRUE,edge.width=0.35)
nodelabels(ornithTree$node.label,adj=c(0,1/2))

#Rhynchonellida
rynchData<-easyGetPBDBtaxa("Rhynchonellida")
rynchTree<-makePBDBtaxonTree(rynchData,"genus",
	method="parentChild", solveMissing="queryPBDB")
plot(rynchTree,show.tip.label=FALSE,no.margin=TRUE,edge.width=0.35)
nodelabels(rynchTree$node.label,adj=c(0,1/2))

#some of these look pretty messy!


## End(Not run)

###################################


#let's try time-scaling the graptolite tree

#get some example occurrence and taxonomic data
data(graptPBDB)

#get the taxon tree: Linnean method
graptTree<-makePBDBtaxonTree(graptTaxaPBDB, "genus", method="Linnean")
plot(graptTree,cex=0.4)
nodelabels(graptTree$node.label,cex=0.5)

#get the taxon tree: parentChild method
graptTree<-makePBDBtaxonTree(graptTaxaPBDB, "genus", method="parentChild")
plot(graptTree,cex=0.4)
nodelabels(graptTree$node.label,cex=0.5)

#get time data from occurrences
graptOccGenus<-taxonSortPBDBocc(graptOccPBDB,rank="genus",onlyFormal=FALSE)
graptTimeGenus<-occData2timeList(occList=graptOccGenus)

#let's time-scale the parentChild tree with paleotree
	# use minimum branch length for visualization
		# and nonstoch.bin so we plot maximal ranges
timeTree<-bin_timePaleoPhy(graptTree,timeList=graptTimeGenus,
	nonstoch.bin=TRUE,type="mbl",vartime=3)

#drops a lot of taxa; some of this is due to mispellings, etc


## Not run: 

#make pretty plot with library strap
library(strap)
geoscalePhylo(timeTree, ages=timeTree$ranges.used)
nodelabels(timeTree$node.label,cex=0.5)


## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.