In ropensci/rcol: Catalogue of Life Client

rcol is a R client for the new Catalogue of Life (COL)

The API is completely different from the old one, so don't expect the same interface. The taxize package used to have some COL functionality but was deprecated when the old COL API was just too unreliable. COL has a lot of different routes, many of which aren't relavant to us here (e.g., creating/editing data in COL, user profile actions, etc.).

Package documentation: https://docs.ropensci.org/rcol/

Check out COL API docs https://api.catalogue.life/ where you can get more info on each API route and try the routes.

The following are a few examples.

Installation

remotes::install_github("ropensci/rcol")
# OR
install.packages("rcol", repos="https://dev.ropensci.org")

library("rcol")

Search

Two function cp_nu_suggest() and cp_nu_search() can be used for searching for taxa. The former returns very minimal information and is meant to return a response very quickly, while the latter is a bit slower but more comprehensive.

cp_nu_suggest(q="Apis", dataset_key = 3)
#> # A tibble: 10 x 8
#>    match   parentOrAccepted… usageId    rank  status score suggestion    nomCode
#>    <chr>   <chr>             <chr>      <chr> <chr>  <dbl> <chr>         <chr>  
#>  1 Apisti… Scorpaeniformes   382e8442-… fami… accep…     0 Apistidae (f… <NA>   
#>  2 Apisto… Spionida          9d89b038-… fami… accep…     0 Apistobranch… <NA>   
#>  3 Apis    Apini             5eecbda1-… genus accep…     0 Apis (genus … zoolog…
#>  4 Apisa   Arctiidae         da6bee8f-… genus accep…     0 Apisa (genus… <NA>   
#>  5 Apispi… Mangeliidae       4a2734aa-… genus accep…     0 Apispiralia … <NA>   
#>  6 Apista… Notodontidae      2deb9907-… genus accep…     0 Apistaeschra… <NA>   
#>  7 Apisto… Apistobranchidae  9f5ef30a-… genus accep…     0 Apistobranch… <NA>   
#>  8 Apisto… Buthidae          e9ffbc3d-… genus accep…     0 Apistobuthus… zoolog…
#>  9 Apisto… Cichlidae         a3d969ca-… genus accep…     0 Apistogramma… <NA>   
#> 10 Apisto… Cichlidae         d6ded63a-… genus accep…     0 Apistogrammo… <NA>
cp_nu_search(q="Apis", rank = "genus")
#> $result
#> # A tibble: 10 x 5
#>    id    classification usage$created $createdBy $modified $modifiedBy
#>    <chr> <list>         <chr>              <int> <chr>           <int>
#>  1 1271… <df[,3] [8 × … 2020-07-02T1…         10 2020-07-…          10
#>  2 1647… <df[,3] [10 ×… 2020-07-02T0…         10 2020-07-…          10
#>  3 x6KS  <df[,3] [6 × … 2020-07-02T1…         10 2020-07-…          10
#>  4 x32YJ <df[,3] [6 × … 2020-07-03T0…         10 2020-07-…          10
#>  5 4e10… <df[,3] [8 × … 2020-03-18T2…        103 2020-03-…         103
#>  6 x3KBG <df[,3] [7 × … 2020-07-02T0…         10 2020-07-…          10
#>  7 1024… <df[,3] [9 × … 2020-07-01T2…         10 2020-07-…          10
#>  8 1005… <df[,3] [6 × … 2020-07-03T0…         10 2020-07-…          10
#>  9 553e… <df[,3] [10 ×… 2020-08-20T1…        103 2020-08-…         103
#> 10 4WXV8 <df[,3] [10 ×… 2020-10-28T1…        102 2020-10-…         102
#> # … with 38 more variables: $datasetKey <int>, $id <chr>, $verbatimKey <int>,
#> #   $name$created <chr>, $$createdBy <int>, $$modified <chr>,
#> #   $$modifiedBy <int>, $$datasetKey <int>, $$id <chr>, $$verbatimKey <int>,
#> #   $$homotypicNameId <chr>, $$scientificName <chr>, $$authorship <chr>,
#> #   $$rank <chr>, $$uninomial <chr>, $$combinationAuthorship$authors <list>,
#> #   $$$year <chr>, $$code <chr>, $$publishedInId <chr>, $$origin <chr>,
#> #   $$type <chr>, $$link <chr>, $$parsed <lgl>, $$sectorKey <int>,
#> #   $$nomStatus <chr>, $status <chr>, $origin <chr>, $parentId <chr>,
#> #   $link <chr>, $referenceIds <list>, $label <chr>, $labelHtml <chr>,
#> #   $sectorKey <int>, $scrutinizerDate <chr>, $extinct <lgl>,
#> #   $scrutinizer <chr>, issues <list>, sectorDatasetKey <int>
#> 
#> $meta
#> # A tibble: 1 x 5
#>   offset limit total last  empty
#>    <int> <int> <int> <lgl> <lgl>
#> 1      0    10  4343 FALSE FALSE

Parsing names

cp_parser() parses scientitic names into their components. See also the package rgnparser (docs: https://ropensci.github.io/rgnparser/) for much more powerul scientitic name parsing.

cp_parser(names = c("Apis mellifera", "Homo sapiens var. sapiens"))
#> # A tibble: 2 x 7
#>   scientificName    rank   genus specificEpithet type   parsed infraspecificEpi…
#>   <chr>             <chr>  <chr> <chr>           <chr>  <lgl>  <chr>            
#> 1 Apis mellifera    speci… Apis  mellifera       scien… TRUE   <NA>             
#> 2 Homo sapiens var… varie… Homo  sapiens         scien… TRUE   sapiens

Vocab

If you're curious about COL vocabularies see the function cp_vocab()

cp_vocab("rank")
#>  [1] "domain"                "realm"                 "subrealm"             
#>  [4] "superkingdom"          "kingdom"               "subkingdom"           
#>  [7] "infrakingdom"          "superphylum"           "phylum"               
#> [10] "subphylum"             "infraphylum"           "superclass"           
#> [13] "class"                 "subclass"              "infraclass"           
#> [16] "parvclass"             "superdivision"         "division"             
#> [19] "subdivision"           "infradivision"         "superlegion"          
#> [22] "legion"                "sublegion"             "infralegion"          
#> [25] "supercohort"           "cohort"                "subcohort"            
#> [28] "infracohort"           "gigaorder"             "magnorder"            
#> [31] "grandorder"            "mirorder"              "superorder"           
#> [34] "order"                 "nanorder"              "hypoorder"            
#> [37] "minorder"              "suborder"              "infraorder"           
#> [40] "parvorder"             "megafamily"            "grandfamily"          
#> [43] "superfamily"           "epifamily"             "family"               
#> [46] "subfamily"             "infrafamily"           "supertribe"           
#> [49] "tribe"                 "subtribe"              "infratribe"           
#> [52] "suprageneric name"     "genus"                 "subgenus"             
#> [55] "infragenus"            "supersection"          "section"              
#> [58] "subsection"            "superseries"           "series"               
#> [61] "subseries"             "infrageneric name"     "species aggregate"    
#> [64] "species"               "infraspecific name"    "grex"                 
#> [67] "subspecies"            "cultivar group"        "convariety"           
#> [70] "infrasubspecific name" "proles"                "natio"                
#> [73] "aberration"            "morph"                 "variety"              
#> [76] "subvariety"            "form"                  "subform"              
#> [79] "pathovar"              "biovar"                "chemovar"             
#> [82] "morphovar"             "phagovar"              "serovar"              
#> [85] "chemoform"             "forma specialis"       "cultivar"             
#> [88] "strain"                "other"                 "unranked"

Datasets

To search datasets, see the function cp_datasets()

cp_datasets(q = "life")
#> $result
#> # A tibble: 10 x 28
#>    title   key alias group license   size created createdBy modified modifiedBy
#>    <chr> <int> <chr> <chr> <chr>    <int> <chr>       <int> <chr>         <int>
#>  1 Cata…  2242 COL2… Biota other   3.98e6 2020-1…       101 2020-12…        101
#>  2 Cata…     3 col-… Biota other   3.99e6 2019-1…       102 2020-12…        102
#>  3 Cata…  2230 COL2… Biota cc by   3.98e6 2020-1…       101 2020-11…        101
#>  4 Cata…  1000 ColH  <NA>  cc by   2.84e3 2019-1…       101 2020-10…        101
#>  5 Cata…  2123 CoL2… Biota cc by   4.52e6 2020-0…       101 2020-10…        101
#>  6 Cata…  2079 CoL2… Biota cc by   4.01e6 2020-0…       101 2020-10…        103
#>  7 Cata…  2165 CoL2… Biota cc by   3.95e6 2020-0…       103 2020-10…        103
#>  8 Cata…  2206 CoL2… Biota cc by   4.00e6 2020-0…       103 2020-10…        103
#>  9 Cata…  2083 CoL2… Biota cc by   4.22e6 2020-0…       101 2020-10…        101
#> 10 Cata…  2081 CoL2… Biota cc by   3.91e6 2020-0…       101 2020-10…        101
#> # … with 22 more variables: sourceKey <int>, importAttempt <int>, type <chr>,
#> #   origin <chr>, description <chr>, organisations <list>,
#> #   contact$givenName <chr>, $familyName <chr>, $email <chr>, $orcid <chr>,
#> #   $name <chr>, editors <list>, version <chr>, released <chr>, citation <chr>,
#> #   geographicScope <chr>, website <chr>, confidence <int>, completeness <int>,
#> #   imported <chr>, private <lgl>, authors <list>
#> 
#> $meta
#> # A tibble: 1 x 5
#>   offset limit total last  empty
#>    <int> <int> <int> <lgl> <lgl>
#> 1      0    10    93 FALSE FALSE

There are A LOT of datasets API routes. Instead of making an R function for each route, we have R functions for some of the "more important" routes, then cp_ds() will allow you to make requests to the remainder of the datasets API routes. The route parameter accepts the route as the examples below, with the preceding /dataset part of the route. Then pass in named parameters to the function to fill in the templated route.

cp_ds(route = "{key}/tree", key = "1000")
#> $offset
#> [1] 0
#> 
#> $limit
#> [1] 100
#> 
#> $total
#> [1] 2
#> 
#> $result
#>   datasetKey  id         rank   status childCount  labelHtml       name
#> 1       1000 343 superkingdom accepted          5  Eukaryota  Eukaryota
#> 2       1000   1 superkingdom accepted          2 Prokaryota Prokaryota
#> 
#> $last
#> [1] TRUE
#> 
#> $empty
#> [1] FALSE
cp_ds(route = "{key}/name/{id}", key = 1005, id = 100003)
#> $created
#> [1] "2020-07-02T23:51:27.62508"
#> 
#> $createdBy
#> [1] 10
#> 
#> $modified
#> [1] "2020-09-30T02:54:17.968034"
#> 
#> $modifiedBy
#> [1] 10
#> 
#> $datasetKey
#> [1] 1005
#> 
#> $id
#> [1] "100003"
#> 
#> $verbatimKey
#> [1] 1810
#> 
#> $homotypicNameId
#> [1] "100003"
#> 
#> $scientificName
#> [1] "Cylindrotoma aurantia"
#> 
#> $authorship
#> [1] "Alexander, 1935"
#> 
#> $rank
#> [1] "species"
#> 
#> $genus
#> [1] "Cylindrotoma"
#> 
#> $specificEpithet
#> [1] "aurantia"
#> 
#> $combinationAuthorship
#> $combinationAuthorship$authors
#> [1] "Alexander"
#> 
#> $combinationAuthorship$year
#> [1] "1935"
#> 
#> 
#> $code
#> [1] "zoological"
#> 
#> $origin
#> [1] "source"
#> 
#> $type
#> [1] "scientific"
#> 
#> $parsed
#> [1] TRUE

Alternatively, pass a named list to the .list parameter.

args <- list(key = 1005, id = 100003)
cp_ds("{key}/name/{id}", .list = args)
#> $created
#> [1] "2020-07-02T23:51:27.62508"
#> 
#> $createdBy
#> [1] 10
#> 
#> $modified
#> [1] "2020-09-30T02:54:17.968034"
#> 
#> $modifiedBy
#> [1] 10
#> 
#> $datasetKey
#> [1] 1005
#> 
#> $id
#> [1] "100003"
#> 
#> $verbatimKey
#> [1] 1810
#> 
#> $homotypicNameId
#> [1] "100003"
#> 
#> $scientificName
#> [1] "Cylindrotoma aurantia"
#> 
#> $authorship
#> [1] "Alexander, 1935"
#> 
#> $rank
#> [1] "species"
#> 
#> $genus
#> [1] "Cylindrotoma"
#> 
#> $specificEpithet
#> [1] "aurantia"
#> 
#> $combinationAuthorship
#> $combinationAuthorship$authors
#> [1] "Alexander"
#> 
#> $combinationAuthorship$year
#> [1] "1935"
#> 
#> 
#> $code
#> [1] "zoological"
#> 
#> $origin
#> [1] "source"
#> 
#> $type
#> [1] "scientific"
#> 
#> $parsed
#> [1] TRUE

Classification

cp_classification(dataset_key=1000, taxon_id=10)
#> # A tibble: 4 x 16
#>   scientificName rank  id    status created createdBy modified modifiedBy
#>   <chr>          <chr> <chr> <chr>  <chr>       <int> <chr>         <int>
#> 1 Thermoprotei   class 6     accep… 2020-0…        10 2020-07…         10
#> 2 Crenarchaeota  phyl… 3     accep… 2020-0…        10 2020-07…         10
#> 3 Archaea        king… 2     accep… 2020-0…        10 2020-07…         10
#> 4 Prokaryota     supe… 1     accep… 2020-0…        10 2020-07…         10
#> # … with 8 more variables: datasetKey <int>, verbatimKey <int>,
#> #   homotypicNameId <chr>, uninomial <chr>, origin <chr>, type <chr>,
#> #   parsed <lgl>, parentId <chr>

Children

cp_children(dataset_key=1000, taxon_id='1')
#> $result
#> # A tibble: 2 x 16
#>   scientificName rank  id    status created createdBy modified modifiedBy
#>   <chr>          <chr> <chr> <chr>  <chr>       <int> <chr>         <int>
#> 1 Archaea        king… 2     accep… 2020-0…        10 2020-07…         10
#> 2 Bacteria       king… 41    accep… 2020-0…        10 2020-07…         10
#> # … with 8 more variables: datasetKey <int>, verbatimKey <int>,
#> #   homotypicNameId <chr>, uninomial <chr>, origin <chr>, type <chr>,
#> #   parsed <lgl>, parentId <chr>
#> 
#> $meta
#> # A tibble: 1 x 5
#>   offset limit total last  empty
#>    <int> <int> <int> <lgl> <lgl>
#> 1      0    10     2 TRUE  FALSE