Description Usage Arguments Details Value Note Author(s) References See Also Examples

Given read counts and HMM parameters (optimized by `nbh_em`

), derive the sequence of hidden states that maximizes the joint likelihood of observed and latent data.

1 |

`count` |
A vector of integers, conceptaully representative of the read counts within bins of chromosome. |

`TRANS` |
Optimized transition probability matrix, a squared matrix of probabilities ( |

`alpha` |
Optimized shape parameter of the NB as a vector of positive values with length equal to that of beta and the row/column of TRANS. |

`beta` |
Optimized inverse scale parameter of the NB as a vector of positive values with length equal to that of beta and the row/column of TRANS. |

Given a K-state HMM with NB emission (NBH), the goal is to find the latent states corresponding to the observed data that maximize the joint likelihood *ln p(X, Z) = ln p(x_1, …, x_N, z_1, …, z_N)*. The optimal solution is obtained via Viterbi algorithm, which essentially belongs to the more general framework of Dynamic Programming.

Briefly, starting from the second node of the Markov chain, we select state of the first node that maximizes *ln p(x_1, x_2, z_2 | z_1)* for *every* state of *z_2*. Then, we move on to the next node and the next until reaching to the last node. In the end, we make choice for the state of the last node that together leads to the maximum *ln p(X, Z)*. Finally, we backtrack to find the choices of states in all of the intermeidate nodes to form the final solution.

A list containing:

`class` |
ML sequence of latent states |

`logl` |
Log-likelihood corresponding to the latents states |

The function is expected to run after learning the model parameters of HMM using `nbh_em`

and (optionally) disambiguating the multihits using `nbh_vit`

. However, nothing prevents user from running it with a random set of HMM parameters. Also, note that Viterbi algorithm finds the most probable *sequence of states*, which is not the same as maximizing the posterior probabilities for all the individual latent variables. For instance, a observed data point may be classified as from state 2 in the most probable chain in spite its marginal posterior probability for state 2 is zero.

Yue Li

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition (Vol. 77, pp. 257-286). Presented at the Proceedings of the IEEE. doi:10.1109/5.18626

Christopher Bishop. Pattern recognition and machine learning. Number 605-631 in Information Science and Statisitcs. Springer Science, 2006.

X. L. Meng, D. B. Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80(2):267-278 (1993).

J. A. Fessler, A. O. Hero, Space-alternating generalized expectation-maximization algorithm, IEEE Tr. on Signal Processing, 42(10):2664 -2677 (1994).

Capp\'e, O. (2001). H2M : A set of MATLAB/OCTAVE functions for the EM estimation of mixtures and hidden Markov models. (http://perso.telecom-paristech.fr/cappe/h2m/)

` nbh_init, nbh, nbh.GRanges, nbh_em,nbm_em`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ```
# Simulate data
TRANS_s <- matrix(c(0.9, 0.1, 0.3, 0.7), nrow=2, byrow=TRUE)
alpha_s <- c(2, 4)
beta_s <- c(1, 0.25)
Total <- 100
x <- nbh_gen(TRANS_s, alpha_s, beta_s, Total);
count <- x$count
label <- x$label
Total <- length(count)
# dummy initialization
TRANS0 <- matrix(rep(0.5,4), 2)
alpha0 <- c(1, 20)
beta0 <- c(1, 1)
NIT_MAX <- 50
TOL <- 1e-100
nbh <- nbh_em(count, TRANS0, alpha0, beta0, NIT_MAX, TOL)
map.accuracy <- length(which(max.col(nbh$postprob) == label))/Total
vit <- nbh_vit(count, nbh$TRANS, nbh$alpha, nbh$beta)
vit.accuracy <- length(which(vit$class == label))/Total
# Plots
par(mfrow=c(2,2), cex.lab=1.2, cex.main=1.2)
plot(count, col="blue", type="l", main=sprintf("A. Simulated Data (Total = %i)",Total))
plot(as.numeric(nbh$logl), xlab="EM Iteration", ylab="Log-Likelihood",
main="B. Log-Likelihood via EM");grid()
# Marginal postprob
plot(nbh$postprob[,2], col="blue", type="l", ylim = c(0,1),
ylab="Marginal Posteriror or True State")
points(label-1, col="red")
title(main = sprintf("C. MAP Prediciton Accuracy = %.2f%s", 100 * map.accuracy, "%"))
# Viterbi states
plot(vit$class - 1, col="dark green", type="l", ylim = c(0,1),
ylab="Viterbi or True State")
points(label-1, col="red")
title(main = sprintf("D. Viterbi Prediciton Accuracy = %.2f%s", 100 * vit.accuracy, "%"))
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.