knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Brief Introduction of GPDD

The Global Population Dynamics Database (GPDD) is a large collection of animal and plant population data, which can be accessed through the KNB website. Below is the official abstract of the dataset.

As a source of animal and plant population data, The Global Population Dynamics Database is unrivalled. Nearly five thousand separate time series are available here. In addition to all the population counts, there are taxonomic details of over 1400 species. The type of data contained in the GPDD varies enormously, from annual counts of mammals or birds at individual sampling sites, to weekly counts of zooplankton and other marine fauna. The project commenced in October 1994, following discussions on ways in which the collaborating partners could make a practical and enduring contribution to research into population dynamics. A small team was assembled and, with assistance and advice from numerous interested parties we decided to construct the database using the popular Microsoft Access platform. After an initial design phase, the major task has been that of locating, extracting, entering and validating the data in all the various tables. Now, nearly 5000 individual datasets have been entered onto the GPDD.

The data is broken into seven files (in text/CSV format): data, main, timeperiod, taxon, datasource, biotope, and location, as well as a metadata file (in EML format) and a user guide (in PDF format). All files are available through the KNB website above. Our package provides a convenient way to download and load the datasets through R.

Using the Package

This package contains only two main functions: download_gpdd and load_gpdd.

Download_gpdd takes in three arguments: dataset_name, dir, and overwrite. The argument dataset_name should be one of "gpdd_data", "gpdd_main", "gpdd_timeperiod", "gpdd_taxon", "gpdd_datasource", "gpdd_biotope" and "gpdd_location", or a character vector containing several of those. Other strings or non-string inputs will raise an error. The downloaded data is automatically saved to the directory provided in dir (in .csv format). By default, if no argument is provided, the function will download all seven datasets, and will save them in the directory rappdirs::user_data_dir("gpdd"). If a requested dataset is already in the directory, then the function by default will ask the user whether to overwrite or not (overwrite = "Ask", by defult). The user can manually decide whether to overwrite by passing in "Yes" or "No" as arguments.

Similar to download_gpdd, load_gpdd takes in two arguments: dataset_name and dir. The argument dataset_name should be one of "data", "main", "timeperiod", "taxon", "datasource", "biotope" and "location", or a character vector containing several of those. Notice that users need not include "gpdd" in dataset_name as in download_gpdd. If a requested dataset is not in the directory, then the function will show a message "xxx does not exist. Proceeding to the next dataset", and continue to the next dataset. All loaded datasets will be saved to a list of dataframes. Therefore, it is required that users assign load_gpdd to a variable (see examples below). If the dataset name is "location", for instance, then the name of the loaded dataset within the list will be location as well.

Please also make sure to install readr and rappdirs before using this package.

An example would be:

library(gpdd)

# Speficy the dataset to be downloaded. Load the dataset.
download_gpdd("gpdd_location", overwrite = "No")
gpdd <- load_gpdd("location")
summary(gpdd$location)

# Download and Load several datasets at a time.
download_gpdd(c("gpdd_biotope", "gpdd_timeperiod"), overwrite = "No")
gpdd <- load_gpdd(c("biotope", "timeperiod"))

# Download to / Load from another directory
test_path <- tempfile("gpdd", tempdir())
dir.create(test_path, showWarnings = FALSE, recursive = TRUE)
download_gpdd(c("gpdd_biotope", "gpdd_timeperiod"), dir = test_path, overwrite = "No")
gpdd <- load_gpdd(c("biotope", "timeperiod"), test_path)

# By default, all seven datasets will be downloaded and saved.
# We can load all datasets.
# No errors will be thrown if the dataset has already been downloaded or loaded.
download_gpdd(overwrite = "No")
gpdd <- load_gpdd()
summary(gpdd$data)
summary(gpdd$main)

# Setting overwrite = "Yes" will overwrite existing datasets, if any.
download_gpdd("gpdd_data", overwrite = "Yes")
gpdd <- load_gpdd("data")
summary(gpdd$data)


boettiger-lab/gpdd documentation built on June 5, 2019, 4:34 p.m.