multinomClassify: Classifying with a Multinomial model

View source: R/multinom.R

multinomClassifyR Documentation

Classifying with a Multinomial model

Description

Classifying sequences by a trained Multinomial model.

Usage

multinomClassify(
  sequence,
  multinom.prob,
  post.prob = FALSE,
  prior = FALSE,
  full.post.prob = FALSE
)

Arguments

sequence

Character vector of sequences to classify.

multinom.prob

A matrix of multinomial probabilities, see multinomTrain.

post.prob

Logical indicating if posterior log-probabilities should be returned.

prior

Logical indicating if classification should be done by flat priors (default) or with empirical priors.

full.post.prob

Logical indicating if full posterior probability matrix should be returned.

Details

The classification step of the multinomial method (Vinje et al, 2015) means counting K-mers on all sequences, and computing the posterior probabilities for each taxon given the trained model. The predicted taxon for each input sequence is the one with the maximum posterior probability for that sequence.

By setting post.prob = TRUE you will get the log-probability of the best and second best taxon for each sequence. This may be used for evaluating the certainty in the classifications.

The classification is parallelized through RcppParallel employing Intel TBB and TinyThread. By default all available processing cores are used. This can be changed using the function setParallel.

Value

If post.prob = FALSE a character vector of predicted taxa is returned.

If post.prob = TRUE a data.frame with three columns is returned.

  • taxon. The predicted taxa, one for each sequence in sequence.

  • post_prob. The posterior log-probability of the assigned taxon.

  • post_prob_2. The largest posterior log-probability of the other taxa.

Author(s)

Kristian Hovde Liland and Lars Snipen.

References

Vinje, H, Liland, KH, Almøy, T, Snipen, L. (2015). Comparing K-mer based methods for improved classification of 16S sequences. BMC Bioinformatics, 16:205.

See Also

KmerCount, multinomTrain.

Examples

data("small.16S")
seq <- small.16S$Sequence
tax <- sapply(strsplit(small.16S$Header,split=" "),function(x){x[2]})
## Not run: 
trn <- multinomTrain(seq,tax)
primer.515f <- "GTGYCAGCMGCCGCGGTAA"
primer.806rB <- "GGACTACNVGGGTWTCTAAT"
reads <- amplicon(seq, primer.515f, primer.806rB)
predicted <- multinomClassify(unlist(reads[nchar(reads)>0]),trn)
print(predicted)

## End(Not run)


larssnip/microclass documentation built on Nov. 1, 2023, 2:39 p.m.