fixLoc: Format Locality Information

View source: R/fixLoc.R

fixLocR Documentation

Format Locality Information

Description

Standardize the notation of the locality fields country, stateProvince, municipality and locality, and search from some missing information within the available locality information.

Usage

fixLoc(
  x,
  loc.levels = c("country", "stateProvince", "municipality", "locality"),
  scrap = TRUE,
  to.lower = TRUE
)

Arguments

x

a data frame containing typical locality fields from species records.

loc.levels

a vector containing the names of the locality fields to be formatted.

scrap

logical. Should the search of missing locality information be performed? Default to TRUE.

to.lower

logical. Should the output locality names be return in lower cases? Default to TRUE.

Details

The function performs several edits and replacements. Country names are formatted into the international format, letters are lower-cased, and special characters and common abbreviations are removed.

By default, this function formats all four locality fields simultaneously (i.e. country, stateProvince, municipality, locality), but the user can choose among these fields through the argument loc.levels. However, the process of searching for missing information is more complete if all the four locality fields mentioned above are available.

If present, other Darwin Core fields are used internally to obtain missing information on the locality fields declared above, namely: 'countryCode', 'county', and 'verbatimLocality'.

The argument scrap controls the search for missing municipality information from the field 'locality'. It also performs some extra editing and cropping of the field 'locality' in order to obtain more standardized locality descriptions. This argument uses different ways of splitting and cropping the locality description in order to find missing information. Although it does not always result in an accurate extraction of the information, it provides an extra tool to organize locality information which are not provided in the appropriate columns.

The function automatically returns the original resolution of the locality information provided. For instance, if only country information is provided (i.e. field is not empty), then the resolution is flagged as 'country'; if country and stateProvince are given, then the resolution is flagged as 'stateProvince', and so on.

Value

The input data frame x, plus the '.new' columns with the formatted fields and the resolution of the locality information available.

Author(s)

Renato A. F. de Lima

Examples

# Creating a data frame with locality information
(df <- data.frame(country = c("BR", "Brasil", "BRA", "Brazil", NA),
stateProvince = c("MG", "estado de Minas Gerais", "Minas Geraes",
"Minas Gerais", "Minas Gerais"),
municipality = c("Lavras", "lavras", NA, NA, "Lavras"),
locality = c(NA, "UFLA", "municipio de Lavras, campus UFLA",
"Minas Gerais, municipio Lavras", NA)))

# Formating the locality information
fixLoc(df, scrap = FALSE)
fixLoc(df, scrap = FALSE, to.lower = FALSE)

# Formating and scrapping the locality information
fixLoc(df, scrap = TRUE)

# Formating the locality information only at country and state levels
fixLoc(df, loc.levels = c("country", "stateProvince"))[,-c(1:4)]


LimaRAF/plantR documentation built on Jan. 1, 2023, 10:18 a.m.