encoding: Encode and decode profile HMMs in raw byte format.
In insect: Informatic Sequence Classification Trees

encoding

R Documentation

Encode and decode profile HMMs in raw byte format.

Description

These functions are used to compress and decompress profile hidden Markov models for DNA to improve memory efficiency.

Usage

encodePHMM(x)

decodePHMM(z)

Arguments

`x`	an object of class "PHMM"
`z`	a raw vector in the encodePHMM schema.

Details

Profile HMMs used in tree-based classification usually include many parameters, and hence large trees with many PHMMs can occupy a lot of memory. Hence a basic encoding system was devised to store the emission and transition probabilities in raw-byte format to three (nearly four) decimal places. This does not seem to significantly affect the accuracy of likelihood scoring, and has a moderate impact on classification speed, but can reduce the memory allocation requirements for large trees by up to 95 percent.

Value

encodePHMM returns a raw vector. decodePHMM returns an object of class "PHMM" (see Durbin et al (1998) and the aphid package for more details on profile hidden Markov models).

Author(s)

Shaun Wilkinson

References

Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.

Examples


  ## generate a simple classification tree with two child nodes
  data(whales)
  data(whale_taxonomy)
  tree <- learn(whales, db = whale_taxonomy, recursive = FALSE)
  ## extract the omnibus profile HMM from the root node
  PHMM0 <- decodePHMM(attr(tree, "model"))
  ## extract the profile HMM from the first child node
  PHMM1 <- decodePHMM(attr(tree[[1]], "model"))

insect documentation built on June 8, 2025, 10:37 a.m.