multinomTrain: Training multinomial model

View source: R/multinom.R

multinomTrainR Documentation

Training multinomial model

Description

Training the multinomial K-mer method on sequence data.

Usage

multinomTrain(sequence, taxon, K = 5, col.names = FALSE, n.pseudo = 1)

Arguments

sequence

Character vector of sequences.

taxon

Character vector of taxon labels for each sequence.

K

Word length (integer).

col.names

Logical indicating if column names (K-mers) should be added to the trained model matrix.

n.pseudo

Number of pseudo-counts to use (positive numerics, need not be integer). Special case -1 will only return word counts, not log-probabilities.

Details

The training step of the multinomial method (Vinje et al, 2015) means counting K-mers on all sequences and compute their multinomial probabilities for each taxon. n.pseudo pseudo-counts are added equally to all K-mers, before probabilities are estimated. The optimal choice of n.pseudo will depend on K and the training data set.

Adding the actual K-mers as column names (col.names = TRUE) will slow down the computations.

The relative taxon frequencies in the taxon input are also computed and returned as an attribute to the probability matrix.

Value

A matrix with the multinomial probabilities, one row for each taxon and one column for each K-mer. The sum of each row is 1.0. No probabilities are 0 if n.pseudo > 0.0.

The matrix has an attribute attr("prior",), that contains the relative taxon frequencies.

Author(s)

Kristian Hovde Liland and Lars Snipen.

References

Vinje, H, Liland, KH, Almøy, T, Snipen, L. (2015). Comparing K-mer based methods for improved classification of 16S sequences. BMC Bioinformatics, 16:205.

See Also

KmerCount, multinomClassify.

Examples

# See examples for multinomClassify


larssnip/microclass documentation built on Nov. 1, 2023, 2:39 p.m.