generate: Generate random sequences from a model.

View source: R/generate.R

generateR Documentation

Generate random sequences from a model.

Description

The generate function outputs a random sequence from a HMM or PHMM.

Usage

generate(x, size, ...)

## S3 method for class 'HMM'
generate(x, size, logspace = "autodetect", random = TRUE, ...)

## S3 method for class 'PHMM'
generate(
  x,
  size,
  logspace = "autodetect",
  gap = "-",
  random = TRUE,
  DNA = FALSE,
  AA = FALSE,
  ...
)

Arguments

x

an object of class 'HMM' or 'PHMM'.

size

a non-negative integer representing the length of the output sequence if x is a "HMM" object with zero probability of transitioning to the begin/end state, or the maximum length of the output sequence otherwise (this acts as a safeguard against overflow).

...

additional arguments to be passed between methods.

logspace

logical indicating whether the emission and transition probabilities of x are logged. If logspace = "autodetect" (the default setting), the function will automatically detect if the probabilities are logged, returning an error if inconsistencies are found. Note that choosing the latter option increases the computational overhead; therefore specifying TRUE or FALSE can reduce the running time.

random

logical indicating whether residues should be emitted randomly with probabilities defined by the emission probabilities in the model (TRUE; default), or deterministically, whereby each residue is emitted and each transition taken based on the maximum emission/transition probability in the current state.

gap

the character used to represent gaps (delete states) in the output sequence (only applicable for PHMM objects).

DNA

logical indicating whether the returned sequence should be a "DNAbin" object. Only applicable if the matrix of emission probabilities in the model has four residues corresponding to the nucleotide alphabet (A, T, G, and C).

AA

logical indicating whether the returned sequence should be a "AAbin" object. Only applicable if the matrix of emission probabilities in the model has 20 residues corresponding to the amino acid alphabet.

Details

This simple function generates a single sequence from a HMM or profile HMM by recursively simulating a path through the model. The function is fairly slow in its current state, but a faster C++ function may be made available in a future version depending on demand.

Value

a named vector giving the sequence of residues emitted by the model, with the "names" attribute representing the hidden states.

Author(s)

Shaun Wilkinson

References

Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.

Examples

  ## Generate a random sequence from a standard HMM
  ## The dishonest casino example from Durbin et al (1998) chapter 3.2
  states <- c("Begin", "Fair", "Loaded")
  residues <- paste(1:6)
  ### Define the transition probability matrix
  A <- matrix(c(0, 0, 0, 0.99, 0.95, 0.1, 0.01, 0.05, 0.9), nrow = 3)
  dimnames(A) <- list(from = states, to = states)
  ### Define the emission probability matrix
  E <- matrix(c(rep(1/6, 6), rep(1/10, 5), 1/2), nrow = 2, byrow = TRUE)
  dimnames(E) <- list(states = states[-1], residues = residues)
  ### Build and plot the HMM object
  x <- structure(list(A = A, E = E), class = "HMM")
  plot(x, main = "Dishonest casino HMM")
  ### Generate a random sequence from the model
  generate(x, size = 300)
  ##
  ## Generate a random sequence from a profile HMM:
  ## Small globin alignment data from Durbin et al (1998) Figure 5.3
  data(globins)
  ### Derive a profile hidden Markov model from the alignment
  globins.PHMM <- derivePHMM(globins, residues = "AMINO", seqweights = NULL)
  plot(globins.PHMM, main = "Profile hidden Markov model for globins")
  ### Simulate a random sequence from the model
  suppressWarnings(RNGversion("3.5.0"))
  set.seed(999)
  simulation <- generate(globins.PHMM, size = 20)
  simulation ## "F" "S" "A" "N" "N" "D" "W" "E"
  ### Names attribute indicates that all residues came from "match" states

aphid documentation built on Dec. 5, 2022, 9:06 a.m.