readDbBash: Read database table using bash commands.

Description Usage Arguments Details Value Note Author(s) References See Also

View source: R/readDbBash.R

Description

Read a database which follows the Darwin Core Standard [1].

Usage

1
2
3
readDbBash(data = NULL, path.data = NULL, cut.col = c(1, 78, 79, 200, 218,
  219), delt.undeterm = TRUE, save.name = NULL, wrt.frmt = "saveRDS",
  save.in = NULL)

Arguments

data

Vector of characters. Name of the input file.

path.data

Vector of characters. Path to the input file.

cut.col

Numeric vector. Columns number to read into database. By default, the columns c(1,78,79,200,218,219) are read. These correspond to headers of the Darwin Core satandard [1] : gbifID, decimalLongitude, decimalLatitude, elevation, speciesKey and species. See details.

delt.undeterm

Logical vector. If it is 'TRUE' return a data table with only occurrences that have taxonomic determination until species. Otherwise, it could return all occurrences read into database.

save.name

Vector of characters. Name of the output file.

wrt.frmt

Vector of characters. Format to save output file. By default it will be written as a R object using the

save.in

Vector od characters. Path to the output file.

Details

We recommend to use this function when the database has more than one hundred thousand occurrences and / or the computer has low memory. readDbBash uses the cut function from BASH programming language and can be functional on Linux or Mac operative systems. If this is not the case, we recomended to use the readDbR which runs into the R platform and can be used on any operative system (Linux, Mac, or Windows). However, the readDbBash function always will be faster than readDbR (up to four times faster).

Databases downloaded from the Global Biodiversity Information Facility (GBIF) [2] are exported with DarwinCore headers and the separator columns is TAB, and hence all databases read using this functions must be able TAB as separator. See readAndWrite function.

For cut.col parameter, the numbers columns to split must be sorted sequentially. For databases downloaded from GBIF [2], the number for each header can be seem using data('ID_DarwinCore) command on console.

For more details about the formats to read and/or write, see readAndWrite function

Value

writing a data table in Data.frame class and as vector return a table with descriptive quantities.

Note

See: R-Alarcon V. and Miranda-Esquivel DR.(submitted) geocleaMT: An R package to cleaning geographical data from electronic biodatabases.

Author(s)

R-Alarcon Viviana and Miranda-Esquivel Daniel R.

References

[1] Wieczorek, J. et al. 2012. Darwin core: An evolving community-developed biodiversity data standard. PloS One 7: e29715.

[2] Global Biodiversity Information Facility. Available online at http://www.gbif.org/.

See Also

readDbR

readAndWrite


alarconvv/geocleaMT documentation built on July 10, 2019, 12:50 a.m.