getDataPBDB: Obtaining Data for Taxa or Occurrences From Paleobiology...
In dwbapst/paleotree: Paleontological and Phylogenetic Analyses of Evolution

getDataPBDB

R Documentation

Obtaining Data for Taxa or Occurrences From Paleobiology Database API

Description

The Paleobiology Database API (link) is very easy to use, and generally any data one wishes to collect can be obtained in R through a variety of ways - the simplest being to wrap a data retrieval request to the API, specified for CSV output, with R function read.csv. The functions listed here, however, are some simple helper functions for doing tasks common to users of this package - downloading occurrence data, or taxonomic information, for particular clades, or for a list of specific taxa.

Usage

getCladeTaxaPBDB(
  taxon,
  showTaxa = c("class", "parent", "app", "img", "entname"),
  status = "accepted",
  urlOnly = FALSE,
  stopIfMissing = FALSE,
  failIfNoInternet = TRUE
)

getSpecificTaxaPBDB(
  taxa,
  showTaxa = c("class", "parent", "app", "img", "entname"),
  status = "accepted",
  urlOnly = FALSE,
  stopIfMissing = FALSE,
  failIfNoInternet = TRUE
)

getPBDBocc(
  taxa,
  showOccs = c("class", "classext", "subgenus", "ident", "entname"),
  failIfNoInternet = TRUE
)

Arguments

`taxon`	A single name of a of a higher taxon which you wish to catch all taxonomic 'children' (included members - i.e. subtaxa) of, from within the Paleobiology Database.
`showTaxa`	Which variables for taxonomic data should be requested from the Paleobiology Database? The default is to include classification (`"class"`), parent-child taxon information (`"parent"`), information on each taxon's first and last appearance (`"app"`), information on the PhyloPic silhouette images assigned to that taxon (`"img"`), and the names of those who entered and authorized the taxonomic data you have downloaded (`"entname"`). Multiple variable blocks can be given as a single character string, with desired variable selections separated by a comma with no whitespace (ex. `"class,img,app"`) or as a vector of character strings (ex. `c("class", "img", "app")`), which will then formatted for use in the API call. Other options that you might want to include, such as information on ecospace or taphonomy, can be included: please refer to the full list at the documentation for the API.
`status`	What taxonomic status should the pull taxa have? The default is `status = "accepted"`, which means only those taxa that are both valid taxa and the accepted senior homonym. Other typical statuses to consider are `"valid"`, which is all valid taxa: senior homonyms and valid subjective synonyms, and `"all"`, which will return all valid taxa and all otherwise repressed invalid taxa. For additional statuses that you can request, please see the documentation at the documentation for the API.
`urlOnly`	If `FALSE` (the default), then the function behaves as expected, the API is called and a data table pulled from the Paleobiology Database is returned. If `urlOnly = TRUE`, the URL of the API call is returned instead as a character string.
`stopIfMissing`	If some taxa within the requested set appear to be missing from the Paleobiology Database's taxonomy table, should the function halt with an error?
`failIfNoInternet`	If the Paleobiology Database or another needed internet resource cannot be accessed, perhaps because of no internet connection, should the function fail (with an error) or should the function return `NULL` and return an informative message instead, thus meeting the CRAN policy that such functionalities must 'fail gracefully'? The default is `TRUE` but all examples that might be auto-run use `FALSE` so they do not fail during R CHECK.
`taxa`	A character vector listing taxa of interest that the user wishes to download information on from the Paleobiology Database. Multiple taxa can be listed as a single character string, with desired taxa separated by a comma with no whitespace (ex. `"Homo,Pongo,Gorilla"`) or as a vector of character strings (ex. `c("Homo", "Pongo", "Gorilla")`), which will then formatted for use in the API call.
`showOccs`	Which variables for occurrence data should be requested from the Paleobiology Database? The default is to include classification (`"class"`), classification identifiers (`"classext"`), genus and subgenus identifiers (`"subgenus"`), and species-level identifiers (`"ident"`). Multiple variable blocks can be given as a single character string, with desired variable selections separated by a comma with no whitespace (ex. `"class,subgenus,ident"`) or as a vector of character strings (ex. `c("class", "subgenus", "ident")`), which will then formatted for use in the API call. For full list of other options that you might want to include, please refer to documentation for the API.

Details

In many cases, it might be easier to write your own query - these functions are only made to make getting data for some very specific applications in paleotree easier.

Value

These functions return a data.frame containing variables pulled for the requested taxon selection. This behavior can be modified by argument urlOnly.

Author(s)

David W. Bapst

References

Peters, S. E., and M. McClennen. 2015. The Paleobiology Database application programming interface. Paleobiology 42(1):1-7.

Examples



# Note that all examples here use argument 
    # failIfNoInternet = FALSE so that functions do
    # not error out but simply return NULL if internet
    # connection is not available, and thus
    # fail gracefully rather than error out (required by CRAN).
# Remove this argument or set to TRUE so functions fail
    # when internet resources (paleobiodb) is not available.

#graptolites
graptData <- getCladeTaxaPBDB("Graptolithina", 
    failIfNoInternet = FALSE)
dim(graptData)
sum(graptData$taxon_rank == "genus")

# so we can see that our call for graptolithina returned 
    # a large number of taxa, a large portion of which are
    # individual genera
# (554 and 318 respectively, as of 03-18-19)

tetrapodList<-c("Archaeopteryx", "Columba", "Ectopistes",
   "Corvus", "Velociraptor", "Baryonyx", "Bufo",
   "Rhamphorhynchus", "Quetzalcoatlus", "Natator",
   "Tyrannosaurus", "Triceratops", "Gavialis",
   "Brachiosaurus", "Pteranodon", "Crocodylus",
   "Alligator", "Giraffa", "Felis", "Ambystoma",
    "Homo", "Dimetrodon", "Coleonyx", "Equus",
   "Sphenodon", "Amblyrhynchus")

tetrapodData <-getSpecificTaxaPBDB(tetrapodList, 
    failIfNoInternet = FALSE)
dim(tetrapodData)
sum(tetrapodData$taxon_rank == "genus")
# should be 26, with all 26 as genera

#############################################
# Now let's try getting occurrence data

# getting occurrence data for a genus, sorting it
# Dicellograptus
dicelloData <- getPBDBocc("Dicellograptus", 
    failIfNoInternet = FALSE)

if(!is.null(dicelloData)){

dicelloOcc2 <- taxonSortPBDBocc(dicelloData, 
    rank = "species", onlyFormal = FALSE, 
    failIfNoInternet = FALSE)
names(dicelloOcc2)

}

dwbapst/paleotree documentation built on July 9, 2024, 9:18 a.m.