getLoc | R Documentation |
This function uses the plantR locality strings to search for existing localities and their respective coordinates in a gazetteer, which can be used to replace missing coordinates and in the validation process of the locality information and geographical coordinates provided.
getLoc( x, str.names = c("resol.orig", "loc.string", "loc.string1", "loc.string2"), gazet = "plantR", gazet.names = c("loc", "loc.correct", "latitude.gazetteer", "longitude.gazetteer", "resolution.gazetteer"), orig.names = FALSE )
x |
a data.frame containing the strings for locality search. See details for the specifications of this data frame. |
str.names |
a vector of at least two columns names containing the locality resolution and search string(s), in that order. Defaults to 'resol.orig', 'loc.string', 'loc.string1' and 'loc.string2'. |
gazet |
a data.frame containing the gazetteer. The default is "plantR", the internal plantR gazetteer (biased towards Latin America). |
gazet.names |
a vector of at least four columns names containing the locality search string, latitude and longitude, in that order. If available, the resolution of the gazetteer can be provided as a fifth name. Defaults to columns names of the plantR gazetteer: 'loc', 'loc.correct', 'latitude.gazetteer', 'longitude.gazetteer' and 'resolution.gazetteer'. |
orig.names |
logical. Should the original columns names of the gazetteer be preserved. Default to FALSE. |
The function was initially designed as part of a larger routine to
edit and validate locality information from plant occurrence data. It is
possible to use it separately, but it may be easier to use it under the
workflow presented in the plantR manual. If used separately, users must
provide a data frame with at least two columns ('resol.orig' and
'loc.string'). Other locality strings ('loc.string1' and 'loc.string2') may
also be provided and in this case, these additional strings are used to
search for information below the municipality/county level, that is, to
retrieve from the gazetteer information at the locality level or below. If
these columns have different names in x
, these names can be supplied
using, the argument str.names
. See Examples below.
The default plantR gazetteer includes information for all countries at the country level (i.e. administrative level 0) and at the lowest administrative level available for all Latin at GDAM (https://gadm.org) for 51 Latin American countries. For Brazil, the gazetteer also contains information at the locality level (e.g. farms, forest fragments, parks), obtained from IBGE, CNCFlora and TreeCo databases. It also includes common spelling variants and historical changes to locality names (currently biased for Brazil) and more common notation variants of locality names found in the locality description of records from GBIF, speciesLink and JABOT databases (include few type localities). In total the gazetteer has nearly 25,000 locality names associated with a valid geographical coordinates.
A different gazetteer than the plantR default can be used. This
gazetteer must be provided using the argument gazet
and it must contain
the columns 'loc' (search string), 'loc.correct' (correct string),
'latitude.gazetteer', 'longitude.gazetteer' (in decimal degrees) and
'resolution.gazetteer' (e.g. country, state, etc). If the names for these
columns are different, they can be supplied using argument gazet.names
.
It is important to stress that the retrieval of locality information depends on the completeness of the gazetteer itself. So, if a query does not find a "valid" locality, it does not necessarily mean that the locality does not exist or that its notation is wrong. It can simply mean that the gazetteer is incomplete for the region you are working with. The gazetteer is permanently being improved. If you find an error or if you want to contribute with region-specific gazetteers, please send an email to raflima@usp.br.
The data frame x
, with the new columns retrieved from the
gazetteer. More specifically, it returns the string used for the search in
the gazetteer (column 'loc'), the string retrieved (if any, column
'loc.correct'), the geographical coordinates (in decimal degrees) and the
resolution associated with the string retrieved (columns
'latitude.gazetteer', 'longitude.gazetteer', and 'resolution.gazetteer',
respectively) and the associated resolution.
Renato A. F. de Lima
fixLoc, strLoc and prepLoc.
## Using the function separately (need to provide column names and #strings in an specific format) (df <- data.frame(resol = c("municipality","locality"), loc = c("brazil_rio janeiro_parati","brazil_rio janeiro_paraty"), loc1 = c(NA, "brazil_rio janeiro_paraty_paraty mirim"), stringsAsFactors = FALSE)) getLoc(df, str.names = c("resol", "loc", "loc1")) ## Using the function under the __plantR__ workflow (df <- data.frame(country = c("BR", "Brazil", "Brasil", "USA"), stateProvince = c("RJ", "Rio de Janeiro", "Rio de Janeiro","Florida"), municipality = c("Paraty", "Paraty", "Parati", NA), locality = c(NA,"Paraty-Mirim", NA, NA), stringsAsFactors = FALSE)) # Formating the locality information occs.fix <- fixLoc(df) # Creating locality strings used to query the gazetteer occs.locs <- strLoc(occs.fix) # Final editing the locality strings (reduces variation in locality notation) occs.locs$loc.string <- prepLoc(occs.locs$loc.string) occs.locs$loc.string1 <- prepLoc(occs.locs$loc.string1) occs.locs$loc.string2 <- prepLoc(occs.locs$loc.string2) # Making the query of the edited strings in the gazetter getLoc(occs.locs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.