map: Optimized profile HMM construction.
In aphid: Analysis with Profile Hidden Markov Models

View source: R/derivePHMM.R

map	R Documentation

Optimized profile HMM construction.

Description

Assigns match and insert states to alignment columns using the maximum a posteriori algorithm outlined in Durbin et al (1998) chapter 5.7.

Usage

map(
  x,
  seqweights = NULL,
  residues = NULL,
  gap = "-",
  endchar = "?",
  pseudocounts = "background",
  lambda = 0,
  qa = NULL,
  qe = NULL,
  cpp = TRUE
)

Arguments

`x`	a matrix of aligned sequences. Accepted modes are "character" and "raw" (the latter being used for "DNAbin" and "AAbin" objects).
`seqweights`	either NULL (default; all sequences are given weights of 1) or a numeric vector the same length as `x` representing the sequence weights used to derive the model.
`residues`	either NULL (default; emitted residues are automatically detected from the sequences), a case sensitive character vector specifying the residue alphabet, or one of the character strings "RNA", "DNA", "AA", "AMINO". Note that the default option can be slow for large lists of character vectors. Furthermore, the default setting `residues = NULL` will not detect rare residues that are not present in the sequences, and thus will not assign them emission probabilities in the model. Specifying the residue alphabet is therefore recommended unless x is a "DNAbin" or "AAbin" object.
`gap`	the character used to represent gaps in the alignment matrix (if applicable). Ignored for `"DNAbin"` or `"AAbin"` objects. Defaults to "-" otherwise.
`endchar`	the character used to represent unknown residues in the alignment matrix (if applicable). Ignored for `"DNAbin"` or `"AAbin"` objects. Defaults to "?" otherwise.
`pseudocounts`	character string, either "background", Laplace" or "none". Used to account for the possible absence of certain transition and/or emission types in the input sequences. If `pseudocounts = "background"` (default), pseudocounts are calculated from the background transition and emission frequencies in the sequences. If `pseudocounts = "Laplace"` one of each possible transition and emission type is added to the transition and emission counts. If `pseudocounts = "none"` no pseudocounts are added (not generally recommended, since low frequency transition/emission types may be excluded from the model). Alternatively this argument can be a two-element list containing a matrix of transition pseudocounts as its first element and a matrix of emission pseudocounts as its second.
`lambda`	penalty parameter used to favour models with fewer match states. Equivalent to the log of the prior probability of marking each column (Durbin et al 1998, chapter 5.7).
`qa`	an optional named 9-element vector of background transition probabilities with `dimnames(qa) = c("DD", "DM", "DI", "MD", "MM", "MI", "ID", "IM", "II")`, where M, I and D represent match, insert and delete states, respectively. If `NULL`, background transition probabilities are estimated from the sequences.
`qe`	an optional named vector of background emission probabilities the same length as the residue alphabet (i.e. 4 for nucleotides and 20 for amino acids) and with corresponding names (i.e. `c("A", "T", "G", "C")` for DNA). If `qe = NULL`, background emission probabilities are automatically derived from the sequences.
`cpp`	logical, indicates whether the dynamic programming matrix should be filled using compiled C++ functions (default; many times faster). The FALSE option is primarily retained for bug fixing and experimentation.

Details

see Durbin et al (1998) chapter 5.7 for details of the maximum a posteriori algorithm for optial match and insert state assignment.

Value

a logical vector with length = ncol(x) indicating the columns to be assigned as match states (TRUE) and those assigned as inserts (FALSE).

Author(s)

Shaun Wilkinson

References

Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.

Examples

## Maximum a posteriori assignment of match states to the small
## alignment example in Figure 5.3, Durbin et al (1998)
data(globins)
map(globins)

aphid documentation built on Dec. 5, 2022, 9:06 a.m.

aphid index

README.md Introduction to the aphid package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

aphid
Analysis with Profile Hidden Markov Models

map: Optimized profile HMM construction.
In aphid: Analysis with Profile Hidden Markov Models

Optimized profile HMM construction.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to map in aphid...

R Package Documentation

Browse R Packages

We want your feedback!

aphid Analysis with Profile Hidden Markov Models

map: Optimized profile HMM construction. In aphid: Analysis with Profile Hidden Markov Models

Optimized profile HMM construction.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to map in aphid...

R Package Documentation

Browse R Packages

We want your feedback!

aphid
Analysis with Profile Hidden Markov Models

map: Optimized profile HMM construction.
In aphid: Analysis with Profile Hidden Markov Models