Extracting taxonomic information from ConTax data sets.
1 2 3 4 5 6 7 8
A vector of texts, typically the
The ConTax data sets are
Fasta objects, where the
Header line follows
a strict format.
The Header always starts with a short text, a Tag, which is a unique identifier for every sequence.
getTag will extract this from the
After the Tag follows one or more tokens. One of these tokens must be a string with the following format:
where <...> is some proper text. Here is an example of a proper string:
getGenus extract the
corresponding information from the
combines all taxonomy extractors, combines these in a data.frame
and imputes missing taxa with parent taxa.
A vector containing the sub-texts extracted from each
1 2 3 4