nbm_em: Expectation conditional maximization of likelihood for...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Given an input read count vector of integers, the function optimzes the parameters for the negative binomial mixture model of K components using expectation conditional maximization.

Usage

1
nbm_em(count, alpha, beta, wght, NBM_NIT_MAX = 250, NBM_TOL = 0.01)

Arguments

count

A vector of integers, conceptaully representing the read counts within bins of chromosome.

alpha

Initial values for α_k for all K NB.

beta

Initial values for β_k for all K NB.

wght

Initial values for π_k for all K NB.

NBM_NIT_MAX

Maximum number of EM iterations (Default: 250).

NBM_TOL

Threshold as fraction of increase in likelihood (given the current NBM parameters) comparing with the likelihood from the last iteration. EM for the NBM stops when the improvement is below the threshold (Default: 0.01).

Details

Given a K-NBM, the goal is to maximize the likelihood function with respect to the parameters comprising of α_k and β_k for the K NB components and the mixing coefficients π_k, which are the priors p(z=k). Because there is no analytical solution for the maximum likelihood (ML) estimators of the above quantities, a modified EM procedures called Expectation Conditional Maximization is employed (Meng and Rubin, 1994).

In E-step, the posterior probability is evaluated using NB density functions with initialized α_k, β_k, and π_k. In the CM step, π_k is evaluated first followed by Newton updates of α_k and β_k. EM iteration terminates when the percetnage of increase of log likelihood drop below NBM_TOL, which is deterministic since EM is guaranteed to converge. For more details, please see the manuscript of RIPSeeker.

Value

A list containing:

alpha

alpha_k for all K components of NB.

beta

beta_k for all K components of NB.

wght

pi_k for all K components of NB.

logl

Log likelihood in each EM iteration.

postprob

Posterior probabilities for each observed data point in the last EM iteration.

Author(s)

Yue Li

References

Bishop, Christopher. Pattern recognition and machine learning. Number 605-631 in Information Science and Statisitcs. Springer Science, 2006.

X. L. Meng, D. B. Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80(2):267-278 (1993).

J. A. Fessler, A. O. Hero, Space-alternating generalized expectation-maximization algorithm, IEEE Tr. on Signal Processing, 42(10):2664 -2677 (1994).

Capp\'e, O. (2001). H2M : A set of MATLAB/OCTAVE functions for the EM estimation of mixtures and hidden Markov models. (http://perso.telecom-paristech.fr/cappe/h2m/)

See Also

nbh_init, nbh, nbh.GRanges, nbh_em

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Simulate data
TRANS_s <- matrix(c(0.9, 0.1, 0.3, 0.7), nrow=2, byrow=TRUE)
alpha_s <- c(2, 4)
beta_s  <- c(1, 0.25)
Total <- 1000
x <- nbh_gen(TRANS_s, alpha_s, beta_s, Total);

N <- 2

cnt <- x$count
label <- x$label

Total <- length(cnt)

# dummy initialization
wght0 <- c(0.5,0.5)

alpha0 <- c(1, 20)

beta0 <- c(1, 1)

NIT_MAX <- 50
TOL <- 1e-100

# initialize param with nbm

nbm <- nbm_em(cnt, alpha0, beta0, wght0, NIT_MAX, TOL)

map.accuracy <- length(which(max.col(nbm$postprob) == label))/Total

print(map.accuracy)

yueli-compbio/RIPSeeker documentation built on May 8, 2019, 2:34 a.m.