knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(occupdr)
The occurrence records available through the Global Biodiversity Information Facility (GBIF) and iDigBio contain a lot of data fields, some of which may be missing for your records of interest, or not relevent to your project.
Functions within occupdr facilitate parsing csv files from these sources and simplifying the output to family and species, latitude and longitude, day of year and year. The output allows combination of data from both sources into a single data file.
The function gbifread() reads tab-delimited csv files such as those provided by GBIF, whereas idigfread() reads comma-delimited files such as those provided by iDigBio. Both functions incorporate the fread function from the package data.table, a package which has features parallelism [^1].
The included datasets are the output of the fread functions above.
gbidata
Note that the " in column 1 is an artifact from creating the example file in MS Excel. It does not occur in data downloaded directly from GBIF
The function gbiclean removes observations for which the species epithet or the date is missing. It creates a new variable for day of year: doy, filters the basisOfRecord column for "HUMAN_OBSERVATION" and "PRESERVED_SPECIMEN" and renames it basis. Finally, seven variables are selected: family species decimalLatitude decimalLongitude basis doy year
g_data <- gbiclean(gbidata) g_data
The process for data from iDigBio is very similar, with one extra necessary step.
idbdata
Note that many column headers contain a colon (:). The function rmcolon replaces it with a period (.)
idb2 <- rmcolon(idbdata) # this function must be assigned to an object head(idb2, 3)
Now the data is ready for the idigclean function, which removes observations for which the species epithet is missing and creates a new variable for day of year: doy. Column headers are renamed to match the output of gbiclean.
idb_data <- idigclean(idb2) idb_data # Warning: This function may trigger an error message if used on data modified in MS Excel.
Note that is function did not remove observations missing dates, so some further cleaning may be desirable.
The two datasets are now formatted to be combined with rbind() if desired, and used to map occurrences.
[^1]:Note: use of this package on a Mac may result in a warning message, but functionality should not be affected. See the data.table installation wiki for more information
Data Sources
http://GBIF.org (27 October 2020) GBIF Occurrence Download https://doi.org/10.15468/dl.24n5p9
http://www.idigbio.org/portal (2020), Query: {"filtered": {"filter": {"and": [{"query": {"match": {"_all": {"operator": "and", "query": "vanessa cardui"}}}}]}}}, 3427 records, accessed on 2020-10-27T19:38:47.660127
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.