classify: classify

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/classify.R

Description

Functions to perform classification by local similarity threshold.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
classify(dmat, groups, dvect, method = "mutinfo", minScore = 0.45,
         doffset = 0.5, dStart = NA, maxDepth = 10, minGroupSize = 2,
         objNames = names(dvect), keep.data = TRUE, ..., verbose =
         FALSE)

classifyIter(dmat, groupTab, dvect, dStart = NA, multiple = FALSE,
             keep.data = TRUE, ..., verbose = FALSE)

classifier(dmat, groups, dvect, method = 'mutinfo', minScore = 0.45,
           doffset = 0.5, dStart = NA, minGroupSize = 2,
           objNames = names(dvect), keep.data = TRUE, ..., verbose = FALSE,
           depth = 1)

pull(dmat, groups, index)

pullTab(dmat, groupTab, index)

Arguments

dmat

Square matrix of pairwise distances.

groups

Object coercible to a factor identifying group membership of objects corresponding to either edge of dmat.

groupTab

a data.frame representing a taxonomy, with columns in increasing order of specificity from left to right (ie, Kingdom –> Species). Column names are used to name taxonomic ranks. Rows correspond to margins of dmat.

dvect

numeric vector of distance from query sequence to each reference corresponding to margins of dmat.

method

The method for calculating the threshold; only 'mutinfo' is currently implemented.

minScore

Threshold value for the match score to define a match.

doffset

Offset used in the denominator of the expression to calculate match score to penalize very small groups of reference objects.

dStart

start with this value of D.

multiple

if TRUE, stops at the rank that yields at least one match; if FALSE, continues to perform classification until exactly one match is identified.

maxDepth

Maximum number of iterations that will be attempted to perform classification.

minGroupSize

The minimal number of members comprising at least one group required to attempt classification.

objNames

Optional character identifiers for objects corresponding to margin of dmat.

keep.data

Populates thresh$distances (see findThreshold) if TRUE.

verbose

Terminal output is produced if TRUE.

index

an integer specifying an element in dmat

...

see Details

depth

specifies iteration number (not meant to be user-defined)

Details

classify performs iterative classification. See the vignette vignette for package clst for a description of the classification algorithm.

classifier performs non-iterative classification, and is typically not called directly by the user.

The functions pull and pullTab are used to remove a single element of dmat for the purpose of performing classification agains the remaining elements. The value of these two functions (a list) can be passed directly to classify or classifyIter directly (see examples).

Value

classify and classifyIter return x, a list of lists, one for each iteration of the classifier. Each sub-list contains the following named elements:

depth

An integer indicating the number of the iteration (where x[[i]]$depth == i)

tally

a data.frame with one row for each group or reference objects. Columns below and above contain counts of reference objects with distance values greater than or less than D, respectively; score, containing match score S; match is 1 if S ≥ minScore, 0 otherwise; and the minimum, median, and maximum values of distances to all members of the indicated group.

details

a list of two matrices, named "below" and "above", itemizing each object with index i in the reference set with distances below or above the distance threshold D, respectively. Columns include index, the index i; dist, the distance between the object and the query; and group, indicating the classification of the object.

matches

Character vector naming groups to which query object belongs.

thresh

object returned by findThreshold

params

a list of input arguments and their values

input

list containing copies of dvect and groups

Author(s)

Noah Hoffman

See Also

findThreshold

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## illustrate classification using the Iris data set
data(iris)
dmat <- as.matrix(dist(iris[,1:4], method="euclidean"))
groups <- iris$Species

## remove one element from the data set and perform classification using
## the remaining elements as the reference set
ind <- 1
cat(paste('class of "unknown" sample is Iris',groups[ind]),fill=TRUE)
cc <- classify(dmat[-ind,-ind], groups[-ind], dvect=dmat[ind, -ind])
printClst(cc)

## this operation can be performed conveinetly using the `pull` function
ind <- 51
cat(paste('class of "unknown" sample is Iris',groups[ind]),fill=TRUE)
cc <- do.call(classify, pull(dmat, groups, ind)) 
printClst(cc)
str(cc)

clst documentation built on Nov. 8, 2020, 5:41 p.m.