This is a tutorial for using the R package rcatfish. rcatfish provides access to the California Academy of Sciences Eschmeyer's Catalog of Fishes within R (Eschmeyer et al., 1998, Fricke et al., 2025, https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp). The Catalog of Fishes database is the gold standard for fish taxonomy as it provides thorough citations of the taxonomic history of fishes and is updated continuosly via standard monthly releases. While there are other packages within R that can be used for checking the taxonomy of organisms, including fishes (i.e. rfishbase, taxize, ritis, etc.), the databases accessed by these packages lacks the expansive information on the taxonomic history of fishes and are typically not up to date on the cutting edge of fish systematics.rcatfish introduces functions to access the various information in the California Academy of Sciences Eschmeyer's Catalog of Fishes. This tutorial provides a basic introduction into using the package and its functions.
Please note when using this package that it is intended to act solely as an interface to Eschmeyer's Catalog of Fishes and is not affiliated with California Academy of Sciences. As Eschmeyer's Catalog of Fishes is only published in a non-machine-readable format, rcatfish is intended to parse the catalog data into a format more suitable for analysis through web scraping. As such, rcatfish will only be as accurate as the data in the published catalog, and spelling or formatting errors may arise due to inconsistencies in Eschmeyer's Catalog of Fishes data entry.
In order to install the stable CRAN version of the rcatfish package:
install.packages("rcatfish")
While we recommend use of the stable CRAN version of this package, we recommend using the package devtools to temporarily install the development version of the package from GitHub if for any reason you wish to use it:
#1. Install 'devtools' if you do not already have it installed:
install.packages("devtools")
#2. Load the 'devtools' package and temporarily install the development version of
#'dietr' from GitHub:
library(devtools)
dev_mode(on=T)
install_github("sborstein/rcatfish") # install the package from GitHub
library(rcatfish)# load the package
#3. Leave developers mode after using the development version of 'rcatfish' so it will not remain on your systempermanently.
dev_mode(on=F)
To make https connections, the dependencies of rcatish utilize curl. In versions of Windows 7 and up, the curl implementation in R can use either openSSL or Windows Secure Channel (SChannel). Only one of these options can be active at a time and the default is Schannel, which conflicts with this package. To see which one you have active, you can do the following:
curl::curl_version()$ssl_version
In the output you may have more than one option. The ones in parentheses are not in use, while the ones lacking parentheses are in use. If you have Schannel in use, you will need to add the following line to your ~/.Renviron file to have curl use openSSL.
CURL_SSL_BACKEND=openssl
This can be added manually or can be done directly in R by running the following line of code.
write('CURL_SSL_BACKEND=openssl', file = "~/.Renviron", append = TRUE)
After adding this line to your ~/.Renviron, re-start R. Then check the curl_version again. Schannel should now be in parentheses while your openSSL option should now lack parentheses, indicating it is in use.
curl::curl_version()$ssl_version
Once installed, you can load rcatfish and all of its functions/data:
library(rcatfish)
Upon loading, you will see a message showing you the version of the Catalog of Fishes as well as how to properly cite the Catalog of Fishes and this R package. In the event you need to get the version of the Catalog of Fishes at any point, you can use the function rcatfish_version() which has no arguments to return the date of the version of the Catalog of Fishes being accessed.
To use the majority of functions in rcatfish you will need the following data as inputs. First, for all functions which search the catalog, a query is required. This is can be either a single search term or a vector of terms to be searched in series, the content of which will be dependent on the particular search function being used (e.g. when searching for references, query may be a catalog reference number).
Aside from a query, several functions require a type as well. This is used to differentiate the search method in functions that support multiple types (e.g. search by type = genus or type = keyword). Similar to query, the content of this parameter will vary based on the function it is being used in, but unlike query it is typically not able to be vectorized and has a specific set of options it must match in each function that calls it.
All other parameters used in this package are optional and unique to each individual function. Several of these will be discussed below, but each function's options can be seen by running ?function_name or help(function_name).
rcatfish_search)To search the Catalog of Fishes's taxonomic records, the function rcatfish_search should be used. This function is equivalent to using the "Search Eschmeyer's Catalog" tab on the catalog's website. While it has several parameters with default arguments, only query and type must be specified.
As an example, a search can be performed for all available species names in the family Rhincodontidae using the following function call:
# Search CoF for Available Species Names in Rhincodontidae rhinco_species <- rcatfish_search(query = "Rhincodontidae", type = "Species") View(rhinco_species)
When viewing the created object rhinco_species above, you should see a large dataframe containing species results of that family.
type Parameter in rcatfish_searchThe type parameter allows you to specify what type of results you want returned and is the equivalent of changing the radio button on the "Search Eschmeyer's Catalog" tab between "Genera" and "Species" (the "References" option is available using the rcatfish_references function which will be discussed later). It has two acceptable inputs, either "Species" or "Genus". This parameter is not vectorizable, so any searches should either all be by genus or all be by species. There is no default option, so it must always be explicitly assigned a value during function calls. To see an example of the difference between the two options:
# Search Rhincodontidae by Species by_species <- rcatfish_search(query = "Rhincodontidae", type = "Species") View(by_species) # Search Rhincodontidae by Genus by_genus <- rcatfish_search(query = "Rhincodontidae", type = "Genus") View(by_genus)
When viewing the outputs from above, you will notice that despite each search containing the same query, the results differ. Please always ensure you have the correct value assigned when searching the catalog.
query, phrase, unavailable, and resolve Parameters in rcatfish_searchThe query parameter represents the keyword(s) to search for in the catalog. These can be any text found in the catalog's entries, including species names, genera, family and subfamily names, author names, type specimen information, and more. This argument is vectorizable, meaning multiple different queries can be passed in one function call. By default there will be a 10-second wait between each query, as the Catalog of Fishes requests at least this much time between requests. As an example:
# Search CoF for Using Multiple Queries searchTerms <- c("Rhincodontidae", "Aldrichetta") result <- rcatfish_search(query = searchTerms, type = "Species") View(result)
As you can see with the created object results from the function call above, one dataframe is returned containing all of the data from both queries, with the first column showing which query returned each result.
By default, queries of more than one word will search for a separate instance of each word in the Catalog of Fish's entries. For example, query = status uncertain will search for any entries which contain the words "status" and "uncertain" anywhere in their text. If you wish to only search for entries which contain the exact phrase "status uncertain", you can specify this using the phrase parameter. By default this parameter is set to FALSE, but by explicitly setting it to TRUE you can have rcatfish_search treat each independent query as an exact phrase. As an example, see the difference in results for the following two searches:
# Search Catalog of Fishes with phrase = FALSE no_phrase <- rcatfish_search(query = "Mugil abu", type = "Species") View(no_phrase) # Search Catalog of Fishes with phrase = TRUE yes_phrase <- rcatfish_search(query = "Mugil abu", type = "Species", phrase = TRUE) View(yes_phrase)
This package also has some functionality for correcting misspellings in searches. By default this is not performed, however by setting resolve = TRUE within the rcatfish_search function call it can be toggled on. When this is set, rcatfish will perform the search as usual, however in the event that no results are found, it will attempt to resolve the search queries using fuzzy matching by finding the closest matches to them through the Global Names Verifier. See the difference below between these two options:
# Searching Catalog of Fishes with a Misspelled Species Name # Without Resolving Names no_resolve <- rcatfish_search(query = "rhincodon tipus", type = "Species") View(no_resolve) # With Resolving Names yes_resolve <- rcatfish_search(query = "rhincodon tipus", type = "Species", resolve = T) View(yes_resolve)
The catalog also allows the searching of unavailable names. By default these names are excluded when running a search, but by changing the unavailable parameter to TRUE they can be included. See the difference below:
# Search Catalog of Fishes Without Unavailable Names no_unavailable <- rcatfish_search(query = "Mitsukurinidae", type = "Species") View(no_unavailable) # Search Catalog of Fishes With Unavailable Names yes_unavailable <- rcatfish_search(query = "Mitsukurinidae", type = "Species", unavailable = TRUE) View(yes_unavailable)
common.name Parameter in rcatfish_searchCurrently, Eschmeyer's Catalog of Fishes does not include the common names of species. rcatfish does, however, have the capability of searching for species by common name by utilizing the rfishbase package. To do this, you can utilize the common.name parameter in rcatfish_search. Please not that searching by common names can only be performed on a species level, not by genus.
By default this parameter is set to FALSE. When explicitly changed to TRUE, the function will first match the query input of common names to any associated scientific names currently found in FishBase. Note that while it is still vectorizable in this format, you can not combine common names with other search terms (e.g. you can search for query = c("Humphead Wrasse", "Channel Catfish") but searching for query = c("Humphead Wrasse", "Lophius piscatorius") may return unexpected results). This parameter will return a list containing the normal rcatfish_search result dataframe and a second dataframe showing the common names provided and the taxonomic names that they were matched to. As an example:
# Search Catalog of Fishes by Common Name common_name_result <- rcatfish_search(query = "Humphead wrasse", type = "Species", common.name = TRUE) View(common_name_result) # The full list returned View(common_name_result[[1]]) # The first dataframe in the list, the normal rcatfish_search output View(common_name_result[[2]]) # The second dataframe in the list, the common names searched and their matches
You can search using common names from other languages if you so desire by setting the language parameter. By default it is set to "English".
taxon.history Parameter in rcatfish_searchEach entry in Eschmeyer's Catalog of Fishes also contains a complete history of that entry's taxonomic status. By default, this is not captured with rcatfish_search, although it can be obtained by setting the taxon.history parameter to TRUE. When this is done, an additional dataframe is returned containing each result's original status, current status, and every change to its status made in between. This search can be performed both by species and by genus. Please note that, particularly for queries with a large number of changes in their history, searching with taxon.history = TRUE may take considerably longer than a typical search. As an example of what this may look like:
# Search Catalog of Fishes by Common Name taxon_history_result <- rcatfish_search(query = "Platyrhina", type = "Genus", taxon.history = TRUE) View(taxon_history_result) # The full list returned View(taxon_history_result[[1]]) # The first dataframe in the list, the normal rcatfish_search output View(taxon_history_result[[2]]) # The second dataframe in the list, the taxonomic histories of the results
rcatfish_searchSeveral other parameters exist in the rcatfish_search function to modify minor aspects of the search function.
The verbose parameter toggles on and off the message displayed to the user when running a search (e.g. "Now on query 1 of 100"). By default it is set to TRUE. Messages can be disabled by changing it to FALSE.
The sleep.time query sets the length of time that the search function will wait between requests to the Catalog of Fishes's server when performing a search of multiple terms. This is set to 10 seconds as requested by the catalog. This parameter should not be modified. Changing this value may result in blacklisting by the catalog.
rcatfish_references)To search through references in Eschmeyer's Catalog of Fishes, the function rcatfish_references should be used. This function is equivalent to using the "Search Eschmeyer's Catalog" tab on the catalog's website and selecting the "References" radio button. It has the parameters query and type, both of which must be specified.
The query parameter can search either by reference number in the catalog of by keyword and can be passed as either a single search term or a vector of terms. The type of search performed is dictated by the the type parameter, which will accept either "RefNo" to search by reference number or "keyword" to search by keyword. Note that when searching by reference number, the query can be passed either as an integer or as a character string (e.g. 41479 and "41479" will return the same results).
# Search references by keyword keyword_reference <- rcatfish_references(query = "Tunisia", type = "keyword") # Search references by reference number RefNo_reference <- rcatfish_references(query = 41479, type = "RefNo")
rcatfish_references can be combined with a result from rcatfish_search to obtain all references associated with the resulting species. As an example:
# Search the catalog for a given species search_result <- rcatfish_search(query = "Cichla cataractae", type = "Species") # Retrieve references from resulting search references <- rcatfish_references(query = search_result$DescriptionRef, type = "RefNo")
Eschmeyer's Catalog of Fishes receives monthly updates. These updates include changes to the taxonomic status of genera and species, changes related to authorship, and the addition of newly described taxa. Users cans see these updates by using the rcatfish_updates function. By default, this function takes no arguments, and will return all changes provided by the most recent update. However, users can specify if they want to return the catalog taxonomic changes, authorship changes, added genera, and added.species with simple TRUE or FALSE. For example, if we wanted to obtain all the changes in a version of the catalog, we can do either of the following:
updates <- rcatfish_updates()
or, we can set specific arguments to return specific update components. These are set to TRUE by default, but users can change these given their names.
updates <- rcatfish_updates(changes = TRUE, author.changes = TRUE, added.genera = TRUE, added.species = TRUE) updates
We can see when running the above code that a list is returned of changes made in the newest edition of the catalog (which is updated once a month). This list will be of a variable length depending on which elements the user asked to return. Other elements of the returned list () are Changes, AuthorshipChanges, AddedGenera, and AddedSpecies, which contain the taxonomic changes, authorship changes, newly added genera to the catalog, and added species to the catalog respectively.
Eschmeyer's Catalog of Fishes provides information on the number of species and genera described per family and subfamily via a table on the following linked page (https://researcharchive.calacademy.org/research/ichthyology/catalog/SpeciesByFamily.asp). rcatfish provides access to this page as well as the ability to return species totals for higher taxonomic entities than just family and subfamily, such as orders and classes using the rcatfish_species_by function. This function simply takes a query that is a subfamily, family, class, or order that the user wishes to obtain data for. For example, if we want to return information for the family Cichlidae we can easily do so using the following:
rcatfish_species_by("Cichlidae")
We can see that this has returned a data frame containing the number of available and valid genera and species, as well as the number of genera and species described in the last decade for the family and all subfamilies in Cichlidae. However, while the Catalog of Fishes does not report these figures at higher taxonomic levels, rcatfish can. We can obtain the number of described genera and species for the order Cichliformes, with the following.
rcatfish_species_by("Cichliformes")
We can see that this has provided not just the number of genera and species in each family within the Cichliformes, but has also returned the total for the entire order, which is not reported on the Catalog of Fishes.
Eschmeyer's Catalog of Fishes provides a hierarchical classification of fishes organized by Class, Order, Suborder, Family, and Subfamily (https://www.calacademy.org/scientists/catalog-of-fishes-classification/). The rcatfish function rcatfish_classification provides access to this table. This function lacks arguments and can be simply called as followed.
# See Current Breakdown of Fish Classification, from Class Through Subfamily fish_classification <- rcatfish_classification() fish_classification
The function returns a data frame that progresses from left to right from most to least inclusive. In addition to providing the hierarchy for Class, Order, Suborder, Family, and Subfamily, the authorship of these taxonomic entities as well as their common name is returned.
# See a Glossary of Terms Used in the Catalog glossary <- rcatfish_glossary()
We can see that the glossary object made in the line of code above creates a data frame object containing a list of technical terms used in the catalog along with definitions and applicable sub-terms.
Besides just citations for references used in the The Catalog of Fishes, the Catalog of Fishes also provides various information on the journals used for references (https://researcharchive.calacademy.org/research/ichthyology/catalog/journals.asp). For example, the Catalog of Fishes provides information for ISSN numbers, publishers, and comments, such as name changes for journals. Information on the journals can be accessed in rcatfish using the rcatfish_journals function. This function simply takes the argument query which is a string to search for as well as if the argument phrase which is if the query should be passed in quotes while searching as a phrase. For example, to search for journals that are related to Texas, we can do the following:
rcatfish_journals("Texas")
We can see that most of these contain Texas in the title, or information on how one, "Contributions in Marine Science" is a continuation of Publications of the Institute of Marine Science, University of Texas.
Note that passing the query as a phrase may impact the success of the search. For example, if we wanted to search for Journal of Zoology, the search will fail if we do not pass the query as a phrase as it will look for each word separately.
rcatfish_journals("Journal of Zoology")
We can successfully search for this query by invoking phrase = TRUE in the arguments:
rcatfish_journals("Journal of Zoology", phrase = TRUE)
Eschmeyer's Catalog of Fishes provides information, such as collection abbreviations, locality, previous names, and online access for museum collections with fish holdings (https://researcharchive.calacademy.org/research/ichthyology/catalog/collections.asp). This information can be accessed through rcatfish via the rcatfish_collections function. rcatfish_collections allows users to search for collections by abbreviation, country, or query term. For example, if we knew the museum abbreviation we wanted to search for, such as the UMMZ for the University of Michigan Museum of Zoology, we could do the following by simply providing UMMZ to the abbreviation argument:
rcatfish_collections(abbreviation = "UMMZ", country = NULL, query = NULL, verbose = TRUE)
We can also pass information to more than one field. This can be useful for narrowing down collection results, such as for countries that have a lot of natural history collections. For this example, lets search for collections in the United States of America and query for collections in California and Alaska. Note that to do queries longer than 1, we must ensure that all arguments are the same length. So, in this case, we need to pass the country twice in our search as follows:
rcatfish_collections(abbreviation = NULL, country = rep("U.S.A.",2), query = c("California","Alaska"), sleep.time = 10)
We may also want to query a phrase, such as "Museum of Zoology" to get a list of collections that contain that name across all collections (similar to what was covered for rcatfish_journals). In order to do more complex queries that are phrases, we need to use the phrase = TRUE argument. We can do the following search as such:
rcatfish_collections(query = "Museum of Zoology", phrase = TRUE)
Most of the functions in rcatfish require a stable internet connection to run as it connects to the online Catalog of Fishes database. If you run into problems, we recommend checking your internet connection as well as visiting the California Academy of Sciences Eschmeyer's Catalog of Fishes site (https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp), to ensure that it is not down for routine maintenance.
Should you find any entries which appear to return different data than anticipated, check the Catalog of Fishes directly to confirm the error and then use the issues section on Github (repo sborstein/rcatfish). Remember that rcatfish is designed to return exactly what is published and will capture any mistakes that are directly in the catalog.
Please note that the authors of this package are not affiliated with Eschmeyer's Catalog of Fishes nor the California Academy of Sciences. As such, we are not able to correct any errors that exist on the Catalog of Fishes or fix/troubleshoot any issues with the Catalog of Fishes itself.
Further information on the functions and their usage can be found in the help files help(package=rcatfish).
For any further issues and questions send an email with subject 'rcatfish support' to borstein@txstate.edu or post to the issues section on GitHub.
Eschmeyer WN (1998). Catalog of Fishes California Academy of Sciences, San Francisco, California, 2905 pp.
Fricke R (2025). Eschmeyer's Catalog of Fishes: References. https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp.
Fricke R, Eschmeyer WN (2025). Eschmeyer's Catalog of Fishes: Guide to Fish Collections. https://researcharchive.calacademy.org/research/ichthyology/catalog/collections.asp.
Fricke R, Eschmeyer WN (2025). Eschmeyer’s Catalog of Fishes: Journals. https://researcharchive.calacademy.org/research/ichthyology/catalog/journals.asp.
Fricke R, Eschmeyer WN, Fong JD (2025). Eschmeyer’s Catalog of Fishes: Species by family/subfamily in the Catalog of Fishes. https://researcharchive.calacademy.org/research/ichthyology/catalog/SpeciesByFamily.asp.
Fricke R, Eschmeyer WN, van der Laan R (2025). Eschmeyer's Catalog of Fishes: Genera, Species, References. https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp.
Fricke R, van der Laan R, Fong JD (2025). Eschmeyer’s Catalog of Fishes: Changes and Additions. https://researcharchive.calacademy.org/research/ichthyology/catalog/ChangeSummary.asp.
van der Laan R, Fricke R, Eschmeyer WN (2025). Eschmeyer's Catalog of Fishes: Classification. https://www.calacademy.org/scientists/catalog-of-fishes-classification/.
van der Laan R, Fricke R, Fong J (2025). Eschmeyer's Catalog of Fishes: Glossary. https://www.calacademy.org/scientists/catalog-of-fishes-glossary/.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.