unwanted_tax_patterns: Default patterns for unwanted taxonomic values

unwanted_tax_patternsR Documentation

Default patterns for unwanted taxonomic values

Description

A named character vector of regular expressions used to identify common problematic values in taxonomy tables. Each element is a regex pattern; names provide human-readable descriptions.

Used as the default replace_to_NA argument in verify_tax_table() and can be reused by other pqverse packages (e.g. dbpq::count_unwanted_tax()).

Usage

unwanted_tax_patterns

Format

A named character vector with 17 elements:

NA-like (NA, NaN, nan)

"^[Nn][Aa][Nn]?$"

NA-like (N/A, n/a)

"^[Nn]/[Aa]$"

None / none

"^[Nn]one$"

empty string

"^$"

whitespace only

"^\\\\s+$"

unclassified

"[Uu]nclassified"

unknown

"[Uu]nknown"

unidentified

"[Uu]nidentified"

uncultured

"[Uu]ncultured"

incertae sedis

"[Ii]ncertae[_\\\\s]?[Ss]edis"

metagenome

"^[Mm]etagenome$"

environmental

"^[Ee]nvironmental"

empty QIIME-style rank

"^[kpcofgs]__$"

unknown species (_sp prefix)

"^_sp"

unknown species (_species prefix)

"^_species"

unknown cluster (MMseqs2)

"_uc$"

unknown ranks (PR2 database)

"__X+$"

See Also

verify_tax_table()

Examples

unwanted_tax_patterns
# Use with grepl to check a value
any(vapply(
  unwanted_tax_patterns,
  \(pat) grepl(pat, "unclassified"),
  logical(1)
))

MiscMetabar documentation built on June 8, 2026, 5:07 p.m.