rcol
is a R client for the new Catalogue of Life (COL)
The API is completely different from the old one, so don't expect
the same interface. The taxize
package used to have some COL
functionality but was
deprecated when the old COL API was just too unreliable. COL
has a lot of different routes, many of which aren't relavant to
us here (e.g., creating/editing data in COL, user profile actions,
etc.).
Package documentation: https://docs.ropensci.org/rcol/
Check out COL API docs https://api.catalogue.life/ where you can get more info on each API route and try the routes.
The following are a few examples.
remotes::install_github("ropensci/rcol") # OR install.packages("rcol", repos="https://dev.ropensci.org")
library("rcol")
Two function cp_nu_suggest()
and cp_nu_search()
can be used for searching
for taxa. The former returns very minimal information and is meant to return
a response very quickly, while the latter is a bit slower but more comprehensive.
cp_nu_suggest(q="Apis", dataset_key = 3) #> # A tibble: 10 x 8 #> match parentOrAccepted… usageId rank status score suggestion nomCode #> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> #> 1 Apisti… Scorpaeniformes 382e8442-… fami… accep… 0 Apistidae (f… <NA> #> 2 Apisto… Spionida 9d89b038-… fami… accep… 0 Apistobranch… <NA> #> 3 Apis Apini 5eecbda1-… genus accep… 0 Apis (genus … zoolog… #> 4 Apisa Arctiidae da6bee8f-… genus accep… 0 Apisa (genus… <NA> #> 5 Apispi… Mangeliidae 4a2734aa-… genus accep… 0 Apispiralia … <NA> #> 6 Apista… Notodontidae 2deb9907-… genus accep… 0 Apistaeschra… <NA> #> 7 Apisto… Apistobranchidae 9f5ef30a-… genus accep… 0 Apistobranch… <NA> #> 8 Apisto… Buthidae e9ffbc3d-… genus accep… 0 Apistobuthus… zoolog… #> 9 Apisto… Cichlidae a3d969ca-… genus accep… 0 Apistogramma… <NA> #> 10 Apisto… Cichlidae d6ded63a-… genus accep… 0 Apistogrammo… <NA> cp_nu_search(q="Apis", rank = "genus") #> $result #> # A tibble: 10 x 5 #> id classification usage$created $createdBy $modified $modifiedBy #> <chr> <list> <chr> <int> <chr> <int> #> 1 1271… <df[,3] [8 × … 2020-07-02T1… 10 2020-07-… 10 #> 2 1647… <df[,3] [10 ×… 2020-07-02T0… 10 2020-07-… 10 #> 3 x6KS <df[,3] [6 × … 2020-07-02T1… 10 2020-07-… 10 #> 4 x32YJ <df[,3] [6 × … 2020-07-03T0… 10 2020-07-… 10 #> 5 4e10… <df[,3] [8 × … 2020-03-18T2… 103 2020-03-… 103 #> 6 x3KBG <df[,3] [7 × … 2020-07-02T0… 10 2020-07-… 10 #> 7 1024… <df[,3] [9 × … 2020-07-01T2… 10 2020-07-… 10 #> 8 1005… <df[,3] [6 × … 2020-07-03T0… 10 2020-07-… 10 #> 9 553e… <df[,3] [10 ×… 2020-08-20T1… 103 2020-08-… 103 #> 10 4WXV8 <df[,3] [10 ×… 2020-10-28T1… 102 2020-10-… 102 #> # … with 38 more variables: $datasetKey <int>, $id <chr>, $verbatimKey <int>, #> # $name$created <chr>, $$createdBy <int>, $$modified <chr>, #> # $$modifiedBy <int>, $$datasetKey <int>, $$id <chr>, $$verbatimKey <int>, #> # $$homotypicNameId <chr>, $$scientificName <chr>, $$authorship <chr>, #> # $$rank <chr>, $$uninomial <chr>, $$combinationAuthorship$authors <list>, #> # $$$year <chr>, $$code <chr>, $$publishedInId <chr>, $$origin <chr>, #> # $$type <chr>, $$link <chr>, $$parsed <lgl>, $$sectorKey <int>, #> # $$nomStatus <chr>, $status <chr>, $origin <chr>, $parentId <chr>, #> # $link <chr>, $referenceIds <list>, $label <chr>, $labelHtml <chr>, #> # $sectorKey <int>, $scrutinizerDate <chr>, $extinct <lgl>, #> # $scrutinizer <chr>, issues <list>, sectorDatasetKey <int> #> #> $meta #> # A tibble: 1 x 5 #> offset limit total last empty #> <int> <int> <int> <lgl> <lgl> #> 1 0 10 4343 FALSE FALSE
cp_parser()
parses scientitic names into their components. See also the
package rgnparser
(docs: https://ropensci.github.io/rgnparser/) for much
more powerul scientitic name parsing.
cp_parser(names = c("Apis mellifera", "Homo sapiens var. sapiens")) #> # A tibble: 2 x 7 #> scientificName rank genus specificEpithet type parsed infraspecificEpi… #> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> #> 1 Apis mellifera speci… Apis mellifera scien… TRUE <NA> #> 2 Homo sapiens var… varie… Homo sapiens scien… TRUE sapiens
If you're curious about COL vocabularies see the function cp_vocab()
cp_vocab("rank") #> [1] "domain" "realm" "subrealm" #> [4] "superkingdom" "kingdom" "subkingdom" #> [7] "infrakingdom" "superphylum" "phylum" #> [10] "subphylum" "infraphylum" "superclass" #> [13] "class" "subclass" "infraclass" #> [16] "parvclass" "superdivision" "division" #> [19] "subdivision" "infradivision" "superlegion" #> [22] "legion" "sublegion" "infralegion" #> [25] "supercohort" "cohort" "subcohort" #> [28] "infracohort" "gigaorder" "magnorder" #> [31] "grandorder" "mirorder" "superorder" #> [34] "order" "nanorder" "hypoorder" #> [37] "minorder" "suborder" "infraorder" #> [40] "parvorder" "megafamily" "grandfamily" #> [43] "superfamily" "epifamily" "family" #> [46] "subfamily" "infrafamily" "supertribe" #> [49] "tribe" "subtribe" "infratribe" #> [52] "suprageneric name" "genus" "subgenus" #> [55] "infragenus" "supersection" "section" #> [58] "subsection" "superseries" "series" #> [61] "subseries" "infrageneric name" "species aggregate" #> [64] "species" "infraspecific name" "grex" #> [67] "subspecies" "cultivar group" "convariety" #> [70] "infrasubspecific name" "proles" "natio" #> [73] "aberration" "morph" "variety" #> [76] "subvariety" "form" "subform" #> [79] "pathovar" "biovar" "chemovar" #> [82] "morphovar" "phagovar" "serovar" #> [85] "chemoform" "forma specialis" "cultivar" #> [88] "strain" "other" "unranked"
To search datasets, see the function cp_datasets()
cp_datasets(q = "life") #> $result #> # A tibble: 10 x 28 #> title key alias group license size created createdBy modified modifiedBy #> <chr> <int> <chr> <chr> <chr> <int> <chr> <int> <chr> <int> #> 1 Cata… 2242 COL2… Biota other 3.98e6 2020-1… 101 2020-12… 101 #> 2 Cata… 3 col-… Biota other 3.99e6 2019-1… 102 2020-12… 102 #> 3 Cata… 2230 COL2… Biota cc by 3.98e6 2020-1… 101 2020-11… 101 #> 4 Cata… 1000 ColH <NA> cc by 2.84e3 2019-1… 101 2020-10… 101 #> 5 Cata… 2123 CoL2… Biota cc by 4.52e6 2020-0… 101 2020-10… 101 #> 6 Cata… 2079 CoL2… Biota cc by 4.01e6 2020-0… 101 2020-10… 103 #> 7 Cata… 2165 CoL2… Biota cc by 3.95e6 2020-0… 103 2020-10… 103 #> 8 Cata… 2206 CoL2… Biota cc by 4.00e6 2020-0… 103 2020-10… 103 #> 9 Cata… 2083 CoL2… Biota cc by 4.22e6 2020-0… 101 2020-10… 101 #> 10 Cata… 2081 CoL2… Biota cc by 3.91e6 2020-0… 101 2020-10… 101 #> # … with 22 more variables: sourceKey <int>, importAttempt <int>, type <chr>, #> # origin <chr>, description <chr>, organisations <list>, #> # contact$givenName <chr>, $familyName <chr>, $email <chr>, $orcid <chr>, #> # $name <chr>, editors <list>, version <chr>, released <chr>, citation <chr>, #> # geographicScope <chr>, website <chr>, confidence <int>, completeness <int>, #> # imported <chr>, private <lgl>, authors <list> #> #> $meta #> # A tibble: 1 x 5 #> offset limit total last empty #> <int> <int> <int> <lgl> <lgl> #> 1 0 10 93 FALSE FALSE
There are A LOT of datasets API routes. Instead of making
an R function for each route, we have R functions for some of the
"more important" routes, then cp_ds()
will allow you to make
requests to the remainder of the datasets API routes. The route
parameter accepts the route as the examples below, with the
preceding /dataset
part of the route. Then pass in named parameters
to the function to fill in the templated route.
cp_ds(route = "{key}/tree", key = "1000") #> $offset #> [1] 0 #> #> $limit #> [1] 100 #> #> $total #> [1] 2 #> #> $result #> datasetKey id rank status childCount labelHtml name #> 1 1000 343 superkingdom accepted 5 Eukaryota Eukaryota #> 2 1000 1 superkingdom accepted 2 Prokaryota Prokaryota #> #> $last #> [1] TRUE #> #> $empty #> [1] FALSE cp_ds(route = "{key}/name/{id}", key = 1005, id = 100003) #> $created #> [1] "2020-07-02T23:51:27.62508" #> #> $createdBy #> [1] 10 #> #> $modified #> [1] "2020-09-30T02:54:17.968034" #> #> $modifiedBy #> [1] 10 #> #> $datasetKey #> [1] 1005 #> #> $id #> [1] "100003" #> #> $verbatimKey #> [1] 1810 #> #> $homotypicNameId #> [1] "100003" #> #> $scientificName #> [1] "Cylindrotoma aurantia" #> #> $authorship #> [1] "Alexander, 1935" #> #> $rank #> [1] "species" #> #> $genus #> [1] "Cylindrotoma" #> #> $specificEpithet #> [1] "aurantia" #> #> $combinationAuthorship #> $combinationAuthorship$authors #> [1] "Alexander" #> #> $combinationAuthorship$year #> [1] "1935" #> #> #> $code #> [1] "zoological" #> #> $origin #> [1] "source" #> #> $type #> [1] "scientific" #> #> $parsed #> [1] TRUE
Alternatively, pass a named list to the .list
parameter.
args <- list(key = 1005, id = 100003) cp_ds("{key}/name/{id}", .list = args) #> $created #> [1] "2020-07-02T23:51:27.62508" #> #> $createdBy #> [1] 10 #> #> $modified #> [1] "2020-09-30T02:54:17.968034" #> #> $modifiedBy #> [1] 10 #> #> $datasetKey #> [1] 1005 #> #> $id #> [1] "100003" #> #> $verbatimKey #> [1] 1810 #> #> $homotypicNameId #> [1] "100003" #> #> $scientificName #> [1] "Cylindrotoma aurantia" #> #> $authorship #> [1] "Alexander, 1935" #> #> $rank #> [1] "species" #> #> $genus #> [1] "Cylindrotoma" #> #> $specificEpithet #> [1] "aurantia" #> #> $combinationAuthorship #> $combinationAuthorship$authors #> [1] "Alexander" #> #> $combinationAuthorship$year #> [1] "1935" #> #> #> $code #> [1] "zoological" #> #> $origin #> [1] "source" #> #> $type #> [1] "scientific" #> #> $parsed #> [1] TRUE
cp_classification(dataset_key=1000, taxon_id=10) #> # A tibble: 4 x 16 #> scientificName rank id status created createdBy modified modifiedBy #> <chr> <chr> <chr> <chr> <chr> <int> <chr> <int> #> 1 Thermoprotei class 6 accep… 2020-0… 10 2020-07… 10 #> 2 Crenarchaeota phyl… 3 accep… 2020-0… 10 2020-07… 10 #> 3 Archaea king… 2 accep… 2020-0… 10 2020-07… 10 #> 4 Prokaryota supe… 1 accep… 2020-0… 10 2020-07… 10 #> # … with 8 more variables: datasetKey <int>, verbatimKey <int>, #> # homotypicNameId <chr>, uninomial <chr>, origin <chr>, type <chr>, #> # parsed <lgl>, parentId <chr>
cp_children(dataset_key=1000, taxon_id='1') #> $result #> # A tibble: 2 x 16 #> scientificName rank id status created createdBy modified modifiedBy #> <chr> <chr> <chr> <chr> <chr> <int> <chr> <int> #> 1 Archaea king… 2 accep… 2020-0… 10 2020-07… 10 #> 2 Bacteria king… 41 accep… 2020-0… 10 2020-07… 10 #> # … with 8 more variables: datasetKey <int>, verbatimKey <int>, #> # homotypicNameId <chr>, uninomial <chr>, origin <chr>, type <chr>, #> # parsed <lgl>, parentId <chr> #> #> $meta #> # A tibble: 1 x 5 #> offset limit total last empty #> <int> <int> <int> <lgl> <lgl> #> 1 0 10 2 TRUE FALSE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.