knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6.5, fig.align = "center" )
citesdb is an R package to conveniently analyze the full CITES shipment-level wildlife trade database, available at https://trade.cites.org/. This data consists of over 40 years and 20 million records of reported shipments of wildlife and wildlife products subject to oversight under the Convention on International Trade in Endangered Species of Wild Fauna and Flora. The source data are maintained by the UN Environment World Conservation Monitoring Centre. @Harfoot_2018 provide a recent overview of broad temporal and spatial trends in the data.
Install the citesdb package with this command:
devtools::install_github("ropensci/citesdb")
Note that since citesdb installs a source dependency from GitHub, you will need package build tools.
options(width = 120) citesdb::cites_disconnect()
When you first load the package you will see a message like this:
library(citesdb) #> Local CITES database empty or corrupt. Download with cites_db_download()
Not to worry, just do as it says and run cites_db_download()
. This will fetch the most recent[^1] database from online, an approximately 158 MB download. It will expand to over 1 GB in the local database. During the download and database building, up to 3.5 GB of disk space may be used temporarily.
[^1]: When data is updated on the CITES website, we check for changes in structure, metadata, or documentation that we should pass on to the user (e.g., in the ?guidance
help file) and re-package the data for distribution in our own repository. Data updates are expected twice per year, and we expect our repository to update within 2 weeks of the official release.
Once you fetch the data you can connect to the database with the cites_db()
command. You can use the cites_shipments()
command to load a remote tibble
that is backed by the database but not loaded into R. You can use this to analyze CITES data without ever loading it into memory, then gather your results with the dplyr
function collect()
. For example, as demonstrated in the package README file, one could compute the number of shipment records per year like so:
if (!citesdb::cites_status()) citesdb::cites_db_download()
library(citesdb) library(dplyr) cites_shipments() %>% group_by(Year) %>% summarize(n_records = n()) %>% arrange(desc(Year)) %>% collect()
(Note that running collect()
on all of cites_shipments()
will load a >3 GB data frame into memory!)
Alternatively, one could visualize the same information by piping results from a cites_shipments()
call directly into ggplot()
:
library(ggplot2) breaks <- seq(from = 1975, to = 2020, by = 5) cites_shipments() %>% group_by(Year) %>% summarize(n_records = n()) %>% mutate(log10_n_records = log10(n_records)) %>% ggplot(aes(x = Year, y = n_records)) + geom_point() + geom_line(linetype = "dashed", size = 0.2) + ylab("Number of Records") + theme_minimal() + scale_x_continuous(breaks = breaks) cites_shipments() %>% group_by(Year) %>% summarize(n_records = n()) %>% mutate(log10_n_records = log10(n_records)) %>% ggplot(aes(x = Year, y = log10_n_records)) + geom_point() + geom_line(linetype = "dashed", size = 0.2) + ylab("log10(Number of Records)") + ylim(1.5, 6.5) + theme_minimal() + scale_x_continuous(breaks = breaks)
The package database also contains tables of field metadata, codes used, and CITES countries. This information comes from "A guide to using the CITES Trade Database", on the CITES website. Convenience functions cites_metadata()
, cites_codes()
, and cites_parties()
access this information:
head(cites_metadata()) head(cites_codes()) head(cites_parties())
More information on the release of shipment-level data can be found in the ?guidance
help file.
citesdb stores data in an on-disk DuckDB SQL database. If you want to use SQL commands for complex queries or otherwise want to directly access the database connection, use the cites_db()
command:
con <- cites_db() DBI::dbListTables(con)
Note that DuckDB, as implemented by the DuckDB package, currently has a limitation of only one connection to the database at a time. Therefore, if you are running R in multiple sessions (say, one in a console and separately knitting an R Markdown document), you will receive this error:
Error: Local citesdb database is locked by another R session. Try closing or running cites_disconnect() in that session.
cites_disconnect()
shuts down the current connection, freeing up your other session to connect.
If you are using a recent version of RStudio interactively, loading the citesdb
package also brings up a browsable pane in the "Connections" tab that lets you explore and preview the database tables.
Click on the arrows to see the data types in each table, or the table icon to the right for a preview of the table. The SQL button (which appears in RStudio >=1.2) opens an SQL document to write and preview SQL queries directly.
rcites
Suppose we were interested in CITES data on sharks from the order Lamniformes. We might begin an analysis by visualizing shipments of these organisms and their derived products over time:
cites_shipments() %>% filter(Order == "Lamniformes") %>% group_by(Year, Taxon) %>% summarize(n_records = n()) %>% ggplot(aes(x = Year, y = n_records, color = Taxon)) + geom_point() + geom_line(linetype = "dashed", size = 0.2) + ylab("Number of Records") + theme_minimal() + scale_x_continuous(breaks = breaks, limits = c(1990, max(breaks)))
What accounts for the temporal differences in the number of CITES records we observe? Why are there no data prior to 2001? It would be helpful to know more about the history of these particular species within CITES.
Fortunately, the rcites package provides access to the Speciesplus/CITES Checklist API, which includes metadata about species and their protected status through time. To use the functions within this package, users will first need to sign up for a Speciesplus API account in order to generate a personal access token. This token can then be set for the current R session using rcites::set_token()
or can be stored permanently if written to the user's .Renviron
file (the approach taken in this tutorial). See here for more information on use of the API access tokens.
As an initial step in rcites
workflows, it will typically be most useful to call spp_taxonconcept()
on some taxa of interest. Using the specific example of one of our Lamniformes sharks, the great white (Carcharodon carcharias), as a query taxon, we can see that this function returns a variety of information:
library(rcites) spp_taxonconcept(query_taxon = "Carcharodon carcharias")$general$id
library(rcites) readRDS(system.file("extdata", "rcites1.rds", package = "citesdb"))
Importantly, we can collect the ID for our taxon of interest from this output and store it as an object for ease of reference:
great_white_id <- spp_taxonconcept("Carcharodon carcharias")$general$id
Using our stored ID variable, we can query other rcites
functions like spp_cites_legislation()
, which returns the CITES listing and reservation status for the query taxon. Note that reservations are essentially exemptions declared by particular CITES Parties in reference to specific taxa.
spp_cites_legislation(great_white_id)$cites_listings
readRDS(system.file("extdata", "rcites2.rds", package = "citesdb"))
So this function query reveals that the great white is listed under CITES Appendix II (as of 12 January 2005) and that three CITES Parties have active reservations for this species.
To reveal the full listing and reservation history, rather than just current information, we use the scope = "all"
argument:
spp_cites_legislation(great_white_id, scope = "all")$cites_listings
readRDS(system.file("extdata", "rcites3.rds", package = "citesdb"))
Here, we see that the species was in fact listed under CITES Appendix III as early as 29 October 2001. This may account for the CITES trade records for this species that date prior to its uplisting to Appendix II, a status which affords more stringent protection and monitoring.
In general, the historical listing information provides critical context for CITES data interpretation. In this case, for instance, it is clear that the lack of CITES data for great white sharks prior to the year 2002 should not be taken as an indication of a lack of trade in this species during that time frame but rather a lack of intergovernmental oversight and data collection.
rcites
also offers functions that return other useful metadata such as species distribution information:
spp_distributions(great_white_id)$distributions
readRDS(system.file("extdata", "rcites4.rds", package = "citesdb"))
More in-depth description of rcites
and usage tutorials can be found via the package publication and website.
The CITES shipment-level data is a valuable resource for understanding part of the global wildlife trade. However, care must be taken in working with this data. Incomplete or ambiguous data can lead to misinterpretation and different approaches to handling the data can lead to different conclusions. @Harrington_2015, @Berec_2018, @Robinson_2018, and @Eskew_2019 all provide greater detail on challenges related to analyzing the CITES Trade Database. Also note the shipment-level guidance provided by CITES, which can be found in the ?guidance
help file.
As an example, single shipments may be recorded multiple times, as they may be reported by both exporting and importing countries. Here we examine a set of shipments of orchids from a US exporter to the Netherlands:
cites_shipments() %>% filter(Year == 2016, Export.permit.RandomID == "ce63001ff5", Genus == "Paphiopedilum") %>% collect() %>% head() %>% knitr::kable()
See that each pair of shipments is identical except for Reporter.type
, indicating that the shipment was reported both from the US and from the Netherlands. Note that while the importer reported both import and export permit numbers, the exporter reported only the export permit numbers.
How to deal with repeat records? If you are aggregating records to calculate total shipments it is best to remove them, but not every record is recorded twice. This code filters to use only the importer's records, should all fields other than Reporter.type
and Import.permit.RandomID
be the same:
cites_shipments() %>% filter(Year == 2016, Export.permit.RandomID == "ce63001ff5", Genus == "Paphiopedilum") %>% collect() %>% group_by_at(vars(-Reporter.type, -Import.permit.RandomID)) %>% mutate(n = n()) %>% filter((n > 1 & Reporter.type == "I") | n == 1) %>% ungroup() %>% head() %>% knitr::kable()
citesdb::cites_disconnect() duckdb::duckdb_shutdown(duckdb::duckdb())
If you use citesdb in a publication, please cite both the package and source data:
print(citation("citesdb"), style = "textVersion")
Have feedback or want to contribute? Great! Please take a look at the contributing guidelines before filing an issue or pull request.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.