dataQC.TermsCheck: make variable names in a dataset to comply to a standrad

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/DataQC_Utils.R

Description

checks a set of terms (e.g. columnnames) to a Standard, and flags inconsistencies, gives solutions if possible.

Usage

1
2
dataQC.TermsCheck(observed=NA, exp.standard="MIxS", 
  exp.section=NA, fuzzy.match=TRUE, out.type="full")

Arguments

observed

character vector. The terms to be checked

exp.standard

character. The expected standard to which he terms should comply. Either MIxS (Minimum Information on any x sequence), DwC (DarwinCore), INSDC (International Nucleotide Sequence Database Consortium).

exp.section

character. Optionally an specific section standard where the terms should come from. When exp.standard is MIxS, the allowed sections are: core, air, built_environment, host_associated, human_associated, human_gut, human_oral, human_skin, human_vaginal, microbial_mat_biofilm, miscellaneous_natural_or_artificial_environment, plant_associated, sediment, soil, wastewater_sludge, water. When exp.standard is DwC, the allowed sections are: event, occurence, emof

fuzzy.match

logical. If TRUE, fuzzy matching will be done when no corresponding term is found in the Standard

out.type

character. The type of the output. Either "full" (the output is a list of three lists: terms_OK=the correct terms, terms_wrongWithSolution = wrong terms with a proposed solution, and terms_notFound = terms that had no match), "logical" (output is logical vector for exact matches or not) or "best_match" (output returns a vector with the best matching terms). default full

Details

For interoperability of data and data archiving, variable names in datasets need to comply to vocabulary standards. This function compares a list of existing terms to the MIxS or DwC standard, and tries to fiend the best matches.

Value

depending on out.type, either a boolean, or a list of length 3, with "$terms_OK" (terms that comply to the standard), "$terms_wrongWithSolution" (terms that do not comply to the standard but have a close match), and "$terms_notFound" (terms that do not comply to the standard, and that not match any term in it)

Author(s)

Maxime Sweetlove CC-0 2020

See Also

Other quality control functions: dataQC.LatitudeLongitudeCheck(), dataQC.TaxonListFromData(), dataQC.completeTaxaNamesFromRegistery(), dataQC.dateCheck(), dataQC.eventStructure(), dataQC.findNames(), dataQC.generate.footprintWKT(), dataQC.guess.env_package.from.data(), dataQC.taxaNames()

Examples

1
2
3
4
dataQC.TermsCheck(observed="ph", exp.standard="MIxS", exp.section=NA, 
                  fuzzy.match=TRUE,out.type="full")
dataQC.TermsCheck(observed="cond", exp.standard="MIxS", exp.section=NA, 
                  fuzzy.match=TRUE,out.type="full")

biodiversity-aq/OmicsMetaData documentation built on Dec. 19, 2021, 9:44 a.m.