Description Usage Arguments Value Author(s) References Examples
This function takes a reference sequence database and allocates each sequence to either a query set (a.k.a. test set) or a training set, in order to cross validate a supervised taxon classifier. The method is based on that of Edgar (2018), but uses recursive divisive clustering and retains all sequences rather than discarding those that violate the top-hit identity constraint.
1 | allocateCVI(x, threshold = 0.9, allocate = "max", ...)
|
x |
a set of reference sequences. Can be a "DNAbin" object or a named vector of upper-case DNA character strings. |
threshold |
numeric between 0 and 1 giving the identity threshold for sequence allocation. |
allocate |
character giving the method to use to allocate eligible sequences to the query set. Options are "max" (default) which chooses the largest node from each pair in order to maximize the size of the query set, or "sample", which randomly chooses one node from each eligible pair. |
... |
further arguments to pass to "kmeans" |
a logical vector the same length as the input object, indicating which sequences should be allocated to the query set
Shaun Wilkinson
Edgar RC (2018) Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences. PeerJ 6:e4652. DOI 10.7717/peerj.4652
1 2 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.