getLoc: Get Locality and Coordinates

View source: R/getLoc.R

getLocR Documentation

Get Locality and Coordinates

Description

This function uses the plantR locality strings to search for existing localities and their respective coordinates in a gazetteer, which can be used to replace missing coordinates and in the validation process of the locality information and geographical coordinates provided.

Usage

getLoc(
  x,
  str.names = c("resol.orig", "loc.string", "loc.string1", "loc.string2"),
  gazet = "plantR",
  gazet.names = c("loc", "loc.correct", "latitude.gazetteer", "longitude.gazetteer",
    "resolution.gazetteer"),
  orig.names = FALSE
)

Arguments

x

a data.frame containing the strings for locality search. See details for the specifications of this data frame.

str.names

a vector of at least two columns names containing the locality resolution and search string(s), in that order. Defaults to 'resol.orig', 'loc.string', 'loc.string1' and 'loc.string2'.

gazet

a data.frame containing the gazetteer. The default is "plantR", the internal plantR gazetteer (biased towards Latin America).

gazet.names

a vector of at least four columns names containing the locality search string, latitude and longitude, in that order. If available, the resolution of the gazetteer can be provided as a fifth name. Defaults to columns names of the plantR gazetteer: 'loc', 'loc.correct', 'latitude.gazetteer', 'longitude.gazetteer' and 'resolution.gazetteer'.

orig.names

logical. Should the original columns names of the gazetteer be preserved. Default to FALSE.

Details

The function was initially designed as part of a larger routine to edit and validate locality information from plant occurrence data. It is possible to use it separately, but it may be easier to use it under the workflow presented in the plantR manual. If used separately, users must provide a data frame with at least two columns ('resol.orig' and 'loc.string'). Other locality strings ('loc.string1' and 'loc.string2') may also be provided and in this case, these additional strings are used to search for information below the municipality/county level, that is, to retrieve from the gazetteer information at the locality level or below. If these columns have different names in x, these names can be supplied using, the argument str.names. See Examples below.

The default plantR gazetteer includes information for all countries at the country level (i.e. administrative level 0) and at the lowest administrative level available for all Latin at GDAM (https://gadm.org) for 51 Latin American countries. For Brazil, the gazetteer also contains information at the locality level (e.g. farms, forest fragments, parks), obtained from IBGE, CNCFlora and TreeCo databases. It also includes common spelling variants and historical changes to locality names (currently biased for Brazil) and more common notation variants of locality names found in the locality description of records from GBIF, speciesLink and JABOT databases (include few type localities). In total the gazetteer has nearly 25,000 locality names associated with a valid geographical coordinates.

A different gazetteer than the plantR default can be used. This gazetteer must be provided using the argument gazet and it must contain the columns 'loc' (search string), 'loc.correct' (correct string), 'latitude.gazetteer', 'longitude.gazetteer' (in decimal degrees) and 'resolution.gazetteer' (e.g. country, state, etc). If the names for these columns are different, they can be supplied using argument gazet.names.

It is important to stress that the retrieval of locality information depends on the completeness of the gazetteer itself. So, if a query does not find a "valid" locality, it does not necessarily mean that the locality does not exist or that its notation is wrong. It can simply mean that the gazetteer is incomplete for the region you are working with. The gazetteer is permanently being improved. If you find an error or if you want to contribute with region-specific gazetteers, please send an email to raflima@usp.br.

Value

The data frame x, with the new columns retrieved from the gazetteer. More specifically, it returns the string used for the search in the gazetteer (column 'loc'), the string retrieved (if any, column 'loc.correct'), the geographical coordinates (in decimal degrees) and the resolution associated with the string retrieved (columns 'latitude.gazetteer', 'longitude.gazetteer', and 'resolution.gazetteer', respectively) and the associated resolution.

Author(s)

Renato A. F. de Lima

See Also

fixLoc, strLoc and prepLoc.

Examples


## Using the function separately (need to provide column names and
#strings in an specific format)
(df <- data.frame(resol = c("municipality","locality"),
                  loc = c("brazil_rio janeiro_parati","brazil_rio janeiro_paraty"),
                  loc1 = c(NA, "brazil_rio janeiro_paraty_paraty mirim"),
                  stringsAsFactors = FALSE))
getLoc(df, str.names = c("resol", "loc", "loc1"))

## Using the function under the __plantR__ workflow
(df <- data.frame(country = c("BR", "Brazil", "Brasil", "USA"),
                     stateProvince = c("RJ", "Rio de Janeiro",
                                       "Rio de Janeiro","Florida"),
                     municipality = c("Paraty", "Paraty", "Parati", NA),
                     locality = c(NA,"Paraty-Mirim", NA, NA),
                     stringsAsFactors = FALSE))

# Formating the locality information
occs.fix <- fixLoc(df)

# Creating locality strings used to query the gazetteer
occs.locs <- strLoc(occs.fix)

# Final editing the locality strings (reduces variation in locality notation)
occs.locs$loc.string <- prepLoc(occs.locs$loc.string)
occs.locs$loc.string1 <- prepLoc(occs.locs$loc.string1)
occs.locs$loc.string2 <- prepLoc(occs.locs$loc.string2)

# Making the query of the edited strings in the gazetter
getLoc(occs.locs)



LimaRAF/plantR documentation built on Jan. 1, 2023, 10:18 a.m.