inst/doc/the-lemis-database.md

title: "The LEMIS Database" author: "Noam Ross, Allison White, Carlos Zambrana-Torrelio, Evan Eskew" date: "2018-07-20" output: html_document: toc: true toc_float: true theme: readable keep_md: true github_document: toc: true html_preview: false vignette: > %\VignetteIndexEntry{Vignette Title} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown}

Introduction

The lemis package provides access U.S. Fish and Wildlife Service data on wildlife and wildlife product imports to and exports from the United States. This data was obtained via more than 14 years of FOIA requests by EcoHealth Alliance.

Installation

Install the lemis package with this command:

source("https://install-github.me/ecohealthalliance/lemis")

As LEMIS currently lives in a private repository, you must have a GitHub personal access token set up to install and use the package. Instructions for this can be found here.

Usage

The main function in lemis is lemis_data(). This returns the main cleaned LEMIS database as a dplyr tibble.

lemis makes use of datastorr to manage data download. The first time you run lemis_data() the package will download the most recent version of the database (~160MB). Subsequent calls will load the database from storage on your computer.

The LEMIS database is stored as an efficiently compressed .fst file, and loading it loads it a a remote dplyr source. This means that it does not load fully into memory until after filter() commands on run, reducing memoery usage. If you wish to manipulate it as a data frame, simply call dplyr::collect() to load it fully into memory.

Note that the full database will be approximately 1 GB in memory.

lemis_codes() returns a data frame with descriptions of the codes used by USFWS in the various columns of lemis_data(). This is useful for lookup or joining with the main data for more descriptive outputs. The ?lemis_code help file also has a searchable table of these codes.

Data structure

The LEMIS database is a single table listing import and export shipments of wildlife products. Here are the field descriptions, which can be found in lemis_metadata().

|field_name |description | |:----------------|:---------------------------------------------------------------------------| |control_number |unique ID | |species_code |A USFWS code for the species | |taxa |an EHA-derived broad taxonomic categorization | |genus |genus of the wildlife product | |species |species of the wildlife product | |subspecies |subspecies of the wildlife product | |specific_name |the species-specific common name | |generic_name |a general common name | |description |description of the type/form of wildlife import, (see codes) | |quantity |numeric quantity of the shipment | |unit |units for the numeric quantity (see codes) | |value |reported value of the shipment in dollars | |country_origin |ISO2C code for the country of origin of the product (see codes) | |country_imp_exp |ISO2C code for the country to/from which the product is shipped (see codes) | |purpose |the reason the item is being imported or exported | |source |the type of source within the origin country (e.g., wild, bred, see codes) | |action |action taken by USFWS on import ((C)leared/(R)efused) | |disposition |what happens to the import (see codes) | |disposition_date |when disposition occurred | |shipment_date |when the shipment arrived | |import_export |whether the shipmnet is an (I)mport or (Export) | |port |port or region of shipment (see codes) | |us_co |U.S. party of the shipment | |foreign_co |Foreign party of the shipment |

Tutorial

This script walks through the basic LEMIS functions

library(lemis)
library(dplyr)

# Load LEMIS data first time (downloads ~160MB file), returns remote
# dplyr source to database
l <- lemis_data()

rm(l)
# Load again (will NOT download data, it's already on your machine)
l <- lemis_data()

# Load data into memory (~Will take up 1GB RAM)
ll <- collect(l)
ll
rm(ll)

# Filter on-disk and load into memory (smaller!)

l_filtered <- l %>%
  filter(import_export == "I", port = "SS") %>%
  select(-taxa)
l_filtered
ll_filtered %>% collect()
ll_filtered
rm(ll_filtered)

# Get data dictionary and codes
lemis_metadata()
lemis_codes()

# View interactive searchable data dictionary
?lemis_metadata
?lemis_codes

# List versions of LEMIS data on your machine
lemis_versions()
lemis_version_current()

# Remove a downloaded version
lemis_del(lemis_version_current())

# Now list local versions again
lemis_versions()
# List versions of LEMIS data available remotely
lemis_versions(local=FALSE)

Use Cases

Here we show some examples of questions that can be answered with this data.

library(lemis)
library(dplyr)
library(ggplot2)
library(forcats)
library(stringi)
theme_set(theme_minimal())

What kind of animals are being imported through Sault Sainte Marie?

sss_port <- # Lookup the port code
  lemis_codes() %>% 
  filter(field == "port", value=="Sault Sainte Marie") %>% 
  pull(code)
lemis_data() %>% 
  filter(port == sss_port, import_export == "I") %>% #Filter before loading the file to disk
  collect() %>% 
  group_by(generic_name) %>% 
  summarise(shipments = n()) %>% 
  arrange(desc(shipments)) %>% 
  head(10) %>% 
ggplot(aes(x = fct_reorder(generic_name, shipments), y = shipments)) +
  geom_col() +
  coord_flip() +
  xlab("Generic Animal Type") + ylab("Shipments via Sault Sainte Marie 2000-2013")

Lots o' bears, it seems.

In what form are Pteropus imported?

lemis_data() %>% 
  filter(genus == "pteropus", import_export == "I") %>% 
  collect() %>% 
  group_by(description) %>% 
  summarize(shipments = n()) %>% 
  left_join(filter(lemis_codes(), field=="description"), by = c("description"="code")) %>% 
  arrange(desc(shipments)) %>% 
ggplot(aes(x = fct_reorder(value, shipments), y = shipments)) +
  geom_col() +
  coord_flip() +
  xlab("Description of Pteropus Imports") + ylab("No. Shipments 2000-2013")

Mostly as scientific specimens. There are a few clear errors in this.

Where are live bats imported from?

As soon as we start dealing with species it's useful to bring in taxonomic resources. The taxize package is a great toolbelt for looking up taxonomic data.

library(taxize)
bat_genera <- downstream(get_tsn("Chiroptera", rows=1), downto = "Genus", db = "itis") %>% 
  `[[`(1)%>% 
  filter(rankname == "genus") %>% 
  pull(taxonname) %>% 
  stri_trans_tolower()
#> 
#> Retrieving data for taxon 'Chiroptera'
lemis_data() %>% 
  filter(import_export == "I", description == "LIV", genus %in% bat_genera) %>% 
  group_by(country_origin) %>% 
  summarize(number = sum(quantity)) %>% 
  left_join(filter(lemis_codes(), field=="country"),
            by=c("country_origin"="code")) %>% 
  ggplot(aes(x=fct_reorder(value, number), y=number)) +
  geom_col() +
  coord_flip() +
  xlab("Origin Country") + ylab("No. Live Bats Imported to U.S., 2000-2013")

Who is importing these live bats?

lemis_data() %>% 
  filter(import_export == "I", description == "LIV", genus %in% bat_genera) %>% 
  group_by(us_co) %>% 
  summarize(number = sum(quantity)) %>% 
  ggplot(aes(x=fct_reorder(us_co, number), y=number)) +
  geom_col() +
  coord_flip() +
  xlab("U.S. Importer") + ylab("No. Live Bats Imported to U.S., 2000-2013")

Hmm, probably some data errors given that there are several coral and aquarium companies on this list.

What countries have the highest fraction of seized shipments?

lemis_data() %>% 
  filter(import_export == "I") %>% 
  group_by(country_origin) %>% 
  summarize(frac_seized = sum(disposition == "S")/n()) %>% 
  left_join(filter(lemis_codes(), field=="country"), by=c("country_origin"="code")) %>%
  arrange(desc(frac_seized)) %>% 
  filter(!is.na(value)) %>% 
  head(30) %>% 
  ggplot(aes(x=fct_reorder(value, frac_seized), y=frac_seized)) +
  geom_col() +
  coord_flip() +
  labs(title="Top 30 Origin Countries by Fraction of Wildlife Shipments Seized",
       x="Origin Country", y="Fraction of Shipments Seized, 2000-2013")



kephelps/LEMIS documentation built on May 7, 2019, 4:40 p.m.