readData: Read Data From DwC-A Files

View source: R/readData.R

readDataR Documentation

Read Data From DwC-A Files

Description

This function reads species records from Darwin Core Archive (DwC-A) files, typically obtained from GBIF as a '.zip' file. It returns different information from inside the DwC-A files so it can enter the plantR workflow. Optionally, it can be used to save the required information into a local directory.

Usage

readData(
  file = NULL,
  path = "",
  dir.name = "",
  dir.tmp = "plantR_input",
  method = "auto",
  bind.data = TRUE,
  output = c("occurrence", "verbatim", "citations"),
  save = FALSE,
  file.format = "csv",
  compress = TRUE,
  sep = "auto",
  quote = "\"",
  na.strings = c("NA")
)

Arguments

file

character. Name of the DwC-A file (often a '.zip') containing the species records.

path

character. The path to the directory where the file was saved or of the web Uniform Resource Locator (URL) from which the file can be downloaded from. Defaults to the user working directory or to gbif download path.

dir.name

character. Name of the folder where the processed data should be saved. Defaults to the directory defined by path.

dir.tmp

character. Name of the sub-folder where the temporary files should be saved within dir.name. Defaults to "plantR_input".

method

the method to be passed to function download.file() for downloading files. Defaults to 'auto'.

bind.data

logical. Should the occurrence and verbatim information be combined into a single table? Defaults to TRUE.

output

character. Which information from the Darwin-Core file should be returned/saved? Default to 'occurrence', 'verbatim' and 'citations'.

save

logical. Should the information be saved to file? Defaults to FALSE.

file.format

character. The file extension to be used for saving. Defaults to 'csv'.

compress

logical. Should the files be compressed? Default to TRUE.

sep

character. The separator between columns to be passed to data.table::fread() (see the help of this function for details).

quote

character. The symbol for text quotes to be passed to data.table::fread(). Defaults to "\"".

na.strings

character. Vector of strings which are to be interpreted as NA values to be passed to data.table::fread(). Defaults to "NA".

Details

This function provides different options to read DwC-A files, typically the ones obtained from GBIF. Currently, this zip file can be read from a local directory or directly from GBIF API address. If the path is an URL address (e.g. https://api.gbif.org/v1/occurrence/download/request), then the function will download the zip file directly from the GBIF API.

The argument output defines which of the information within GBIF DwC-A files should be returned. Currently, the outputs available are: 'occurrence', 'verbatim', 'multimedia', 'citations' and 'rights'. If more than one output is selected, the function returns a list in which each element represent the selected outputs. Currently, no data and database metadata are returned (i.e. '.xml' files). See package finch for the complete parsing of DwC-A files and metadata.

All temporary files and folders are deleted after the extraction of the information, except if save is TRUE. In this case, only the unzipped files within the DwC-A file are removed.

Downloading large files (more than 2GB) may be an issue for some R versions. The method 'wget' may be more appropriate for users with proxy firewalls (see help of function download.file()).

Examples

## Not run: 
  occs <- readData(file = "0227351-200613084148143.zip",
                   path <- "https://api.gbif.org/v1/occurrence/download/request/")

## End(Not run)


LimaRAF/plantR documentation built on Jan. 1, 2023, 10:18 a.m.