View source: R/StandardizeNomenclature.R
StandardizeNomenclature | R Documentation |
Functions to map the user provided nomenclature into a standard one as defined in a thesaurus.
StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE) StandardizeDataSet(data, thesaurusSet = zoologThesaurus)
x |
Character vector. |
thesaurus |
A thesaurus object. |
mark.unknown |
Logical. If |
data |
A data frame. |
thesaurusSet |
A thesaurus set. |
StandardizeNomenclature
standardizes a character vector
according to a given thesaurus.
StandardizeDataSet
standardizes column names and values of
a data frame according to a thesaurus set.
StandardizeNomenclature
returns a vector of the same length as the
input vector x
. The names present in the thesaurus are set to their
corresponding category. The names not in the thesaurus are kept unchanged if
mark.unknown=FALSE
(default) and set to NA
if
mark.unknown=TRUE
.
StandardizeDataSet
returns a data frame with the same structure as
the input data
, but standardizing its nomenclature according to a thesaurus set
including appropriate thesauri for its column names and for the values of
a set of columns.
zoologThesaurus
for a description of the thesaurus and
thesaurus set structure,
ThesaurusReaderWriter
, ThesaurusManagement
## Select the thesaurus for taxa present in the thesaurus set ## zoolog::zoologThesaurus: thesaurus <- zoologThesaurus$taxon thesaurus ## Standardize an heterodox vector of taxa: StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"), thesaurus) ## Observe that "giraffe" is kept unchanged since it is not included in ## any thesaurus category. ## But if mark.unknown is set to TRUE, it is marked as NA: StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"), thesaurus, mark.unknown = TRUE) ## This thesaurus is not case sensitive: attr(thesaurus, "caseSensitive") # == FALSE ## Thus, names are recognized independently of their case: StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"), thesaurus) ## Load an example data frame: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package = "zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## Observe mainly the first columns: head(dataExample[,1:5]) ## Stadardize the dataset: dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus) head(dataStandardized[,1:5])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.