formatOcc: Format Names, Numbers, Dates and Codes

View source: R/formatOcc.R

formatOccR Documentation

Format Names, Numbers, Dates and Codes

Description

This function standardizes collector names, determiner names, collection number, dates and collection codes from herbarium occurrences obtained from on-line databases, such as GBIF or speciesLink.

Usage

formatOcc(x, noNumb = "s.n.", noYear = "n.d.", noName = "s.n.")

Arguments

x

a data frame, containing typical fields from occurrence records from herbarium specimens

noNumb

character. The standard notation for missing data in the field 'Number'. Default to "s.n."

noYear

character. The standard notation for missing data in the field 'Year'. Default to "n.d."

noName

character. The standard notation for missing data in the field 'Name'. Default to "s.n."

Details

The function works similarly to a wrapper, where many individual steps of the proposed plantR workflow for editing collection information are performed altogether (see the plantR tutorial and the help of each function for details).

Ideally, the input data frame must contain at least the following fields from the Darwin Core standards (the functions default):

  • 'institutionCode' and 'collectionCode' (codes of the institution and collection);

  • 'year' and 'eventDate' (year of the collection);

  • 'recordedBy' (collector(s) name(s));

  • 'recordNumber' (collector number)

  • 'identifiedBy' (identifier name);

  • 'yearIdentified' and 'dateIdentified' (year of identification)

Missing year of collection in the field 'year' are internally replaced by the date stored in the field 'eventDate', if this field is not empty as well.

Value

The input data frame x, plus the new columns with the formatted information. The new columns have the same name as proposed by the Darwin Core standards followed by the suffix '.new'.

Author(s)

Renato A. F. de Lima

See Also

getCode, fixName, colNumber, getYear, prepTDWG, prepName, missName and lastName.


LimaRAF/plantR documentation built on Jan. 1, 2023, 10:18 a.m.