fixLoc | R Documentation |
Standardize the notation of the locality fields country, stateProvince, municipality and locality, and search from some missing information within the available locality information.
fixLoc( x, loc.levels = c("country", "stateProvince", "municipality", "locality"), scrap = TRUE, to.lower = TRUE )
x |
a data frame containing typical locality fields from species records. |
loc.levels |
a vector containing the names of the locality fields to be formatted. |
scrap |
logical. Should the search of missing locality information be performed? Default to TRUE. |
to.lower |
logical. Should the output locality names be return in lower cases? Default to TRUE. |
The function performs several edits and replacements. Country names are formatted into the international format, letters are lower-cased, and special characters and common abbreviations are removed.
By default, this function formats all four locality fields simultaneously
(i.e. country, stateProvince, municipality, locality), but the user can
choose among these fields through the argument loc.levels
. However, the
process of searching for missing information is more complete if all the
four locality fields mentioned above are available.
If present, other Darwin Core fields are used internally to obtain missing information on the locality fields declared above, namely: 'countryCode', 'county', and 'verbatimLocality'.
The argument scrap
controls the search for missing municipality
information from the field 'locality'. It also performs some extra editing
and cropping of the field 'locality' in order to obtain more standardized
locality descriptions. This argument uses different ways of splitting and
cropping the locality description in order to find missing information.
Although it does not always result in an accurate extraction of the
information, it provides an extra tool to organize locality information
which are not provided in the appropriate columns.
The function automatically returns the original resolution of the locality information provided. For instance, if only country information is provided (i.e. field is not empty), then the resolution is flagged as 'country'; if country and stateProvince are given, then the resolution is flagged as 'stateProvince', and so on.
The input data frame x
, plus the '.new' columns with the
formatted fields and the resolution of the locality information available.
Renato A. F. de Lima
# Creating a data frame with locality information (df <- data.frame(country = c("BR", "Brasil", "BRA", "Brazil", NA), stateProvince = c("MG", "estado de Minas Gerais", "Minas Geraes", "Minas Gerais", "Minas Gerais"), municipality = c("Lavras", "lavras", NA, NA, "Lavras"), locality = c(NA, "UFLA", "municipio de Lavras, campus UFLA", "Minas Gerais, municipio Lavras", NA))) # Formating the locality information fixLoc(df, scrap = FALSE) fixLoc(df, scrap = FALSE, to.lower = FALSE) # Formating and scrapping the locality information fixLoc(df, scrap = TRUE) # Formating the locality information only at country and state levels fixLoc(df, loc.levels = c("country", "stateProvince"))[,-c(1:4)]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.