StandardizeNomenclature: Standardize Nomenclature

View source: R/StandardizeNomenclature.R

StandardizeNomenclatureR Documentation

Standardize Nomenclature

Description

Functions to map the user provided nomenclature into a standard one as defined in a thesaurus.

Usage

StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE)

StandardizeDataSet(data, thesaurusSet = zoologThesaurus)

Arguments

x

Character vector.

thesaurus

A thesaurus object.

mark.unknown

Logical. If FALSE (default) the strings not found in the thesaurus are kept without change. If TRUE the strings not in the thesaurus are set to NA.

data

A data frame.

thesaurusSet

A thesaurus set.

Details

StandardizeNomenclature standardizes a character vector according to a given thesaurus.

StandardizeDataSet standardizes column names and values of a data frame according to a thesaurus set.

Value

StandardizeNomenclature returns a vector of the same length as the input vector x. The names present in the thesaurus are set to their corresponding category. The names not in the thesaurus are kept unchanged if mark.unknown=FALSE (default) and set to NA if mark.unknown=TRUE.

StandardizeDataSet returns a data frame with the same structure as the input data, but standardizing its nomenclature according to a thesaurus set including appropriate thesauri for its column names and for the values of a set of columns.

See Also

zoologThesaurus for a description of the thesaurus and thesaurus set structure,

ThesaurusReaderWriter, ThesaurusManagement

Examples

## Select the thesaurus for taxa present in the thesaurus set
## zoolog::zoologThesaurus:
thesaurus <- zoologThesaurus$taxon
thesaurus
## Standardize an heterodox vector of taxa:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
                        thesaurus)
## Observe that "giraffe" is kept unchanged since it is not included in
## any thesaurus category.
## But if mark.unknown is set to TRUE, it is marked as NA:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
                        thesaurus, mark.unknown = TRUE)

## This thesaurus is not case sensitive:
attr(thesaurus, "caseSensitive") #  == FALSE
## Thus, names are recognized independently of their case:
StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"),
                        thesaurus)

## Load an example data frame:
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz",
                        package = "zoolog")
dataExample <- utils::read.csv2(dataFile,
                                na.strings = "",
                                encoding = "UTF-8")
## Observe mainly the first columns:
head(dataExample[,1:5])
## Stadardize the dataset:
dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus)
head(dataStandardized[,1:5])


zoolog documentation built on Aug. 26, 2022, 5:08 p.m.