Description Usage Arguments Details See Also Examples
EM algorithm to estimate the parameters alpha, beta, p1, ..., pk-1 of a negative binomial mixture model with k components.
1 2 | EM_smartstart(theta.null, x.syn, x.non, cvec, iter1, iter2, epsilon1 = NULL,
epsilon2 = NULL, ui, ci, full.output = F)
|
theta.null |
Starting values for the parameters p1, ..., pk-1 in EM1. |
x.syn |
Vector of synonymous mutation counts per gene. |
x.non |
Vector of non-synonymous mutation counts per gene. |
cvec |
Parameters c1, ..., ck. |
iter1 |
Maximum number of iterations for EM1. |
iter2 |
Maximum number of iterations for EM2. |
epsilon1 |
Stopping criterion for EM1. If NULL, the maximum number of iterations is used. |
epsilon2 |
Stopping criterion for EM2. If NULL, the maximum number of iterations is used. |
ui |
Constraints on the parameters. See details. |
ci |
Constraints on the parameters. See details. |
full.output |
logical. If T, the results of the ML estimation and EM1 are output. Otherwise, only the output of EM2 is output (in the same format as by the function |
The parameters alpha and beta are estimated on the synonymous mutations by ML. Then, the parameters p1, ..., pk-1 are estimated by an EM algorithm on the non-synonymous mutations using the estimated alpha and beta as known parameters (EM1). Finally, alpha, beta and p1, ..., pk-1 are used as starting values in another EM algorithm, where all of them are re-estimated (EM2).
For the estimation of alpha and beta on the synonymous mutations, the function MASS::fitdistr
is used.
In the M step of both EM algorithms, constrained optimization is conducted with the function constrOptim
. ui
and ci
are defined accordingly (see constrOptim): ui %*% theta - ci >= 0
. It is important that ui and ci are specified such that p is a probability vector, see examples. It is not allowed to specify constraints that affect (alpha and beta) and (p1, ..., pk-1) jointly.
As a stopping criterion, the difference in log-likelihood can be used. To make sure that there is always at least a short trajectory for visual checks, the criterion is only used after at least 5 iterations.
The function EM_smartstart_apply can be used to run the algorithm on a list of datasets (e. g. using mclapply for parallel execution).
MASS::fitdistr
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ### Example with 2 components ###
# Simulate dataset of synonymous and non-synonymous mutations
c1 = 0.5; c2 = 10
p1 = 0.2; p2 = 0.8
alpha = 10
beta = 5
mutations = rnegbinmix_syn_nonsyn(n=3000, alpha=alpha, beta=beta, c=c(c1, c2), p = c(p1, p2))
# EM algorithm
# Constraints: ui %*% theta - ci >= 0
# alpha > 0, beta > 0, 0 <= p1 <=1
ui = rbind(c(1, 0, 0), c(0, 1, 0), c(0, 0, 1), c(0, 0, -1))
ci = c(0, 0, 0, -1)
# EM algorithm
EMtest = EM_smartstart(theta.null = 0.5, x.syn = mutations$Syn, x.non = mutations$Non, cvec=c(c1, c2), iter1=10, iter2=10, epsilon2 = 10^(-8), ui = ui, ci = ci)
matplot(EMtest$theta, t="b")
plot(EMtest$loglikelihood, t="b")
### Example with 4 components ###
c1 = 0.1; c2 = 1; c3 = 10; c4 = 100;
cvec = c(c1, c2, c3, c4)
p1 = 0.2; p2 = 0.6; p3 = 0.1; p4 = 0.1;
pvec = c(p1, p2, p3, p4)
alpha = 1.72396
beta = 89.019
mutations = rnegbinmix_syn_nonsyn(n=3000, alpha=alpha, beta=beta, c=cvec, p = pvec)
# EM algorithm
# Constraints: ui %*% theta - ci >= 0
# alpha > 0, beta > 0, 0 <= p1 <=1
ui = rbind(diag(1, 5),
cbind(rep(0, 3), rep(0, 3), diag(-1, 3)),
c(0, 0, rep(-1, 3)))
ci = c(rep(0, 5), rep(-1, 4))
# EM algorithm
EMtest = EM_smartstart(theta.null = c(0.25, 0.25, 0.25), x.syn = mutations$Syn, x.non = mutations$Non, cvec=cvec, iter1=10, iter2=50, epsilon2 = 10^(-8), ui = ui, ci=ci, full.output=T)
matplot(EMtest$EM_final$theta, t="b")
plot(EMtest$EM_final$loglikelihood, t="b")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.