allocateCVI: Allocate sequences for cross validation by identity.
In shaunpwilkinson/insect: Informatic Sequence Classification Trees

Description Usage Arguments Value Author(s) References Examples

This function takes a reference sequence database and allocates each sequence to either a query set (a.k.a. test set) or a training set, in order to cross validate a supervised taxon classifier. The method is based on that of Edgar (2018), but uses recursive divisive clustering and retains all sequences rather than discarding those that violate the top-hit identity constraint.

1	allocateCVI(x, threshold = 0.9, allocate = "max", ...)

`x`	a set of reference sequences. Can be a "DNAbin" object or a named vector of upper-case DNA character strings.
`threshold`	numeric between 0 and 1 giving the identity threshold for sequence allocation.
`allocate`	character giving the method to use to allocate eligible sequences to the query set. Options are "max" (default) which chooses the largest node from each pair in order to maximize the size of the query set, or "sample", which randomly chooses one node from each eligible pair.
`...`	further arguments to pass to "kmeans"