makePBDBtaxonTree: Creating a Taxon-Tree from Taxonomic Data Downloaded from the...

View source: R/makePBDBtaxonTree.R

makePBDBtaxonTreeR Documentation

Creating a Taxon-Tree from Taxonomic Data Downloaded from the Paleobiology Database

Description

The function makePBDBtaxonTree creates phylogeny-like object of class phylo from the taxonomic information recorded in a taxonomy download from the PBDB for a given group. Two different algorithms are provided, the default being based on parent-child taxon relationships, the other based on the nested Linnean hierarchy. The function plotTaxaTreePBDB is also provided as a minor helper function for optimally plotting the labeled topologies that are output by makePBDBtaxonTree.

Usage

makePBDBtaxonTree(
  taxaDataPBDB,
  rankTaxon,
  method = "parentChild",
  tipSet = NULL,
  cleanTree = TRUE,
  annotatedDuplicateNames = TRUE,
  APIversion = "1.2",
  failIfNoInternet = TRUE
)

plotTaxaTreePBDB(taxaTree, edgeLength = 1)

Arguments

taxaDataPBDB

A table of taxonomic data collected from the Paleobiology Database, using the taxa list option with show = class. Should work with versions 1.1-1.2 of the API, with either the pbdb or com vocab. However, as accepted_name is not available in API v1.1, the resulting tree will have a taxon's *original* name and not any formally updated name.

rankTaxon

The selected taxon rank; must be one of 'species', 'genus', 'family', 'order', 'class' or 'phylum'.

method

Controls which algorithm is used for calculating the taxon-tree. The default option is method = "parentChild" which converts the listed binary parent-child taxon relationships in the Paleobiology Database- these parent-child relationships (if missing from the input dataset) are autofilled using API calls to the Paleobiology Database. Alternatively, users may use method = "Linnean", which converts the table of Linnean taxonomic assignments (family, order, etc as provided by show = class in PBDB API calls) into a taxon-tree. Two methods formerly both implemented under method = "parentChild" are also available as method = "parentChildOldMergeRoot" and method = "parentChildOldQueryPBDB" respectively. Both of these use similar algorithms as the current method = "parentChild" but differ in how they treat taxa with parents missing from the input taxonomic dataset. method = "parentChildOldQueryPBDB" behaves most similar to method = "parentChild" in that it queries the Paleobiology Database via the API , but repeatedly does so for information on parent taxa of the 'floating' parents, and continues within a while loop until only one such unassigned parent taxon remains. This latter option may talk a long time or never finish, depending on the linearity and taxonomic structures encountered in the PBDB taxonomic data; i.e. if someone a taxon was ultimately its own indirect child in some grand loop by mistake, then under this option makePBDBtaxonTree might never finish. In cases where taxonomy is bad due to weird and erroneous taxonomic assignments reported by the PBDB, this routine may search all the way back to a very ancient and deep taxon, such as the Eukaryota taxon. method = "parentChildOldMergeRoot" will combine these disparate potential roots and link them to an artificially-constructed pseudo-root, which at least allows for visualization of the taxonomic structure in a limited dataset. This latter option will be fully offline, as it does not do any additional API calls of the Paleobiology Database, unlike other options.

tipSet

This argument only impacts analyses where method = "parentChild" is used. This tipSet argument controls which taxa are selected as tip taxa for the output tree. tipSet = "nonParents" selects all child taxa which are not listed as parents in parentChild. Alternatively, tipSet = "all" will add a tip to every internal node with the parent-taxon name encapsulated in parentheses. The default is NULL - if tipSet = NULL and method = "parentChild", then tipSet will be set so tipSet = "nonParents".

cleanTree

When TRUE (the default), the tree is run through a series of post-processing, including having singles collapsed, nodes reordered and being written out as a Newick string and read back in, to ensure functionality with ape functions and ape-derived functions. If FALSE, none of this post-processing is done and users should beware, as such trees can lead to hard-crashes of R.

annotatedDuplicateNames

A logical determining whether duplicate taxon names, when found in the Paleobiology Database for taxa (presumably reflecting an issue with taxa being obsolete but with incomplete seniority data), should be annotated to include sequential numbers so to modify them, via functionbase's make.unique. This only applies to method = "parentChild", with the default option being annotatedDuplicateNames = TRUE. If more than 26 duplicates are found, an error is issued. If this argument is FALSE, an error is issued if duplicate taxon names are found.

APIversion

Version of the Paleobiology Database API used by makePBDBtaxonTree when method = "parentChild" or method = "parentChildOldQueryPBDB" is used. The current default is APIversion = "1.2", the most recent API version as of 12/11/2018.

failIfNoInternet

If the Paleobiology Database or another needed internet resource cannot be accessed, perhaps because of no internet connection, should the function fail (with an error) or should the function return NULL and return an informative message instead, thus meeting the CRAN policy that such functionalities must 'fail gracefully'? The default is TRUE but all examples that might be auto-run use FALSE so they do not fail during R CHECK.

taxaTree

A phylogeny of class phylo, presumably a taxon tree as output from makePBDBtaxonTree with higher-taxon names as node labels.

edgeLength

The edge length that the plotted tree should be plotted with (plotTaxaTreePBDB plots phylogenies as non-ultrametric, not as a cladogram with aligned tips).

Details

This function should not be taken too seriously. Many groups in the Paleobiology Database have out-of-date or very incomplete taxonomic information. This function is meant to help visualize what information is present, and by use of time-scaling functions, allow us to visualize the intersection of temporal and phylogenetic, mainly to look for incongruence due to either incorrect taxonomic placements, erroneous occurrence data or both.

Note however that, contrary to common opinion among some paleontologists, taxon-trees may be just as useful for macroevolutionary studies as reconstructed phylogenies (Soul and Friedman, 2015).

Value

A phylogeny of class phylo, where each tip is a taxon of the given rankTaxon. See additional details regarding branch lengths can be found in the sub-algorithms used to create the taxon-tree by this function: parentChild2taxonTree and taxonTable2taxonTree.

Depending on the method used, either the element $parentChild or $taxonTable is added to the list structure of the output phylogeny object, which was used as input for one of the two algorithms mentioned above.

Please note that when applied to output from the taxa option of the API version 1.1, the taxon names returned are the original taxon names as 'accepted_name' is not available in API v1.1, while under API v1.2, the returned taxon names should be the most up-to-date formal names for those taxa. Similar issues also effect the identification of parent taxa, as the accepted name of the parent ID number is only provided in version 1.2 of the API.

Author(s)

David W. Bapst

References

Peters, S. E., and M. McClennen. 2015. The Paleobiology Database application programming interface. Paleobiology 42(1):1-7.

Soul, L. C., and M. Friedman. 2015. Taxonomy and Phylogeny Can Yield Comparable Results in Comparative Palaeontological Analyses. Systematic Biology (doi: 10.1093/sysbio/syv015)

See Also

Two other functions in paleotree are used as sub-algorithms by makePBDBtaxonTree to create the taxon-tree within this function, and users should consult their manual pages for additional details:

parentChild2taxonTree and taxonTable2taxonTree

Closely related functions for

Other functions for manipulating PBDB data can be found at taxonSortPBDBocc, occData2timeList, and the example data at graptPBDB.

Examples

# Note that most examples here use argument 
    # failIfNoInternet = FALSE so that functions do
    # not error out but simply return NULL if internet
    # connection is not available, and thus
    # fail gracefully rather than error out (required by CRAN).
# Remove this argument or set to TRUE so functions DO fail
    # when internet resources (paleobiodb) is not available.

set.seed(1)



#get some example occurrence and taxonomic data
data(graptPBDB)

#get the taxon tree: Linnean method
graptTreeLinnean <- makePBDBtaxonTree(
    taxaDataPBDB = graptTaxaPBDB,
    rankTaxon = "genus",
    method = "Linnean", 
    failIfNoInternet = FALSE)

#get the taxon tree: parentChild method
graptTreeParentChild <- makePBDBtaxonTree(
    taxaDataPBDB = graptTaxaPBDB,
    rankTaxon = "genus",
    method = "parentChild", 
    failIfNoInternet = FALSE)
    
if(!is.null(graptTreeParentChild) & 
        !is.null(graptTreeLinnean)){
    # if those functions worked...
    # let's plot these and compare them! 
    plotTaxaTreePBDB(graptTreeParentChild)
    plotTaxaTreePBDB(graptTreeLinnean)
    }


# pause 3 seconds so we don't spam the API
Sys.sleep(3)

####################################################
# let's try some other groups

###################################
#conodonts

conoData <- getCladeTaxaPBDB("Conodonta", 
    failIfNoInternet = FALSE)

if(!is.null(conoData)){ 
 
conoTree <- makePBDBtaxonTree(
    taxaDataPBDB = conoData,
    rankTaxon = "genus",
    method = "parentChild")

# if it worked, plot it!
plotTaxaTreePBDB(conoTree)

}

# pause 3 seconds so we don't spam the API
Sys.sleep(3)

#############################
#asaphid trilobites

asaData <- getCladeTaxaPBDB("Asaphida", 
    failIfNoInternet = FALSE)
    
if(!is.null(asaData)){

asaTree <- makePBDBtaxonTree(
    taxaDataPBDB = asaData,
    rankTaxon = "genus",
    method = "parentChild")

# if it worked, plot it!
plotTaxaTreePBDB(asaTree)

}

# pause 3 seconds so we don't spam the API
Sys.sleep(3)

###############################
#Ornithischia

ornithData <- getCladeTaxaPBDB("Ornithischia", 
    failIfNoInternet = FALSE)

if(!is.null(ornithData)){

ornithTree <- makePBDBtaxonTree(
    taxaDataPBDB = ornithData,
    rankTaxon = "genus",
    method = "parentChild")

# if it worked, plot it!
plotTaxaTreePBDB(ornithTree)

# pause 3 seconds so we don't spam the API
Sys.sleep(3)

#try Linnean!

#but first... need to drop repeated taxon first: Hylaeosaurus
    # actually this taxon seems to have been repaired 
    # as of September 2019 !
# findHylaeo <- ornithData$taxon_name == "Hylaeosaurus"
# there's actually only one accepted ID number
# HylaeoIDnum <- unique(ornithData[findHylaeo,"taxon_no"])
# HylaeoIDnum 
# so, take which one has occurrences listed
# dropThis <- which((ornithData$n_occs < 1) & findHylaeo)
# ornithDataCleaned <- ornithData[-dropThis,]

ornithTree <- makePBDBtaxonTree(
    ornithData,
    rankTaxon = "genus",
    method = "Linnean", 
    failIfNoInternet = FALSE)

# if it worked, plot it!
plotTaxaTreePBDB(ornithTree)

}

# pause 3 seconds so we don't spam the API
Sys.sleep(3)

#########################
# Rhynchonellida

rhynchData <- getCladeTaxaPBDB("Rhynchonellida", 
    failIfNoInternet = FALSE)
    
if(!is.null(rhynchData)){  

rhynchTree <- makePBDBtaxonTree(
    taxaDataPBDB = rhynchData,
    rankTaxon = "genus",
    method = "parentChild")

    # if it worked, plot it!
    plotTaxaTreePBDB(rhynchTree)
    }

#some of these look pretty messy!




paleotree documentation built on Aug. 22, 2022, 9:09 a.m.