threshID: Measures of identification accuracy

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/threshID.R

Description

Tests of barcoding efficacy using distance-based methods.

Usage

1
threshID(distobj, sppVector, threshold = 0.01, names = FALSE)

Arguments

distobj

A distance object (usually from dist.dna).

sppVector

Vector of species names. See sppVector.

threshold

Distance cutoff for identifications. Default of 0.01 (1%).

names

Logical. Should the names of the nearest match be shown? Default of FALSE.

Details

These functions test barcoding efficacy. All sequences must be identified prior to testing. Each sequence is considered an unknown while the remaining sequences in the dataset constitute the DNA barcoding database that is used for identification. If the identification from the test is the same as the pre-considered identification, a correct result is returned.

bestCloseMatch conducts the "best close match" analysis of Meier et al. (2006), considering the closest individual unless it is further than the given threshold, which results in no identification. More than one species tied for closest match results in an assignment of "ambiguous". When the threshold is large, this analysis will return essentially the same result as nearNeighbour. If names = TRUE, a list is returned containing the names of all species represented by specimens within the threshold.

nearNeighbour finds the closest individual and returns if their names are the same (TRUE) or different (FALSE). If names = TRUE, the name of the closest individual is returned. Ties are decided by majority rule.

threshID conducts a threshold-based analysis, similar to that conducted by the "Identify Specimen" tool provided by the Barcode of Life Database (http://www.boldsystems.org/index.php/IDS_OpenIdEngine). It is more inclusive than bestCloseMatch, considering ALL sequences within the given threshold. If names = TRUE, a list is returned containing the names of all species represented by specimens within the threshold.

These functions are not recommended as identification tools, though they can be used as such when names = TRUE.

Value

bestCloseMatch and threshID return a character vector giving the identification status of each individual.

"correct"

The name of the closest match is the same

"incorrect"

The name of the closest match is different

"ambiguous"

More than one species is the closest match (bestCloseMatch), or is within the given threshold (threshID)

"no id"

No species are within the threshold distance

nearNeighbour returns a logical vector or (if names = TRUE) the name for the nearest individual.

Author(s)

Samuel Brown <[email protected]>

References

Meier, R., Shiyang, K., Vaidya, G., & Ng, P. (2006). DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. _Systematic Biology_ *55* (5) 715-728.

See Also

nearNeighbour, threshID, dist.dna, sppVector Also as help, ~~~

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data(anoteropsis)
anoDist <- ape::dist.dna(anoteropsis)
anoSpp <- sapply(strsplit(dimnames(anoteropsis)[[1]], split = "_"), 
    function(x) paste(x[1], x[2], sep = "_"))

bestCloseMatch(anoDist, anoSpp)
bestCloseMatch(anoDist, anoSpp, threshold = 0.005)
nearNeighbour(anoDist, anoSpp)
nearNeighbour(anoDist, anoSpp, names = TRUE)
threshID(anoDist, anoSpp)
threshID(anoDist, anoSpp, threshold = 0.003)

data(dolomedes)
doloDist <- ape::dist.dna(dolomedes)
doloSpp <- substr(dimnames(dolomedes)[[1]], 1, 5)

bestCloseMatch(doloDist, doloSpp)
bestCloseMatch(doloDist, doloSpp, threshold = 0.005)
nearNeighbour(doloDist, doloSpp)
nearNeighbour(doloDist, doloSpp, names=TRUE)
threshID(doloDist, doloSpp)
threshID(doloDist, doloSpp, threshold = 0.003)

spider documentation built on Feb. 17, 2018, 1:02 a.m.