EM_proportions: EM algorithm for the negative binomial mixture model

Description Usage Arguments Details Value Author(s) See Also Examples

Description

EM algorithm to estimate the parameters p1, ..., pk-1 of a negative binomial mixture model with k components.

Usage

1
2
EM_proportions(theta.null, x.non, alpha, beta, cvec, iter, epsilon = NULL, ui,
  ci)

Arguments

theta.null

Starting value, vector consisting of p1, ..., pk-1.

x.non

Vector of non-synonymous mutation counts per gene.

cvec

Parameter vector c1, ..., ck.

iter

Maximum number of iterations.

epsilon

Log-likelihood difference stopping criterion. If NULL, the algorithm is run for iter iterations. Otherwise, it is run until the increase in the log-likelihood is smaller than epsilon, but at most for iter iterations.

ui

Linear constraints for theta. See details and examples.

ci

Linear constraints for theta. See details and examples.

Details

The parameters alpha, beta, c1, ..., ck are fixed. Alpha and beta can for example be obtained from the synonymous mutations.

In the M step, constrained optimization is conducted with the function constrOptim. ui and ci are defined accordingly (see constrOptim): ui %*% theta - ci >= 0. It is important that ui and ci are specified such that p is a probability vector, see examples.

As a stopping criterion, the difference in log-likelihood can be used. To make sure that there is always at least a short trajectory for visual checks, the criterion is only used after at least 5 iterations.

Value

theta

Matrix with the trajectories of the parameter values.

Q

Vector with the trajectory of the Q values.

loglikelihood

Vector with the trajectory of the log likelihood values.

convergence

Vector with the convergence code for each iteration of the M-step of the algorithm, see stats::optim (0 indicates successful completion).

Author(s)

Johanna Bertl

See Also

rnegbinmix::EM for estimation of all parameters.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
### Example with 2 components ###

# Simulate dataset non-synonymous mutations

c1 = 0.5; c2 = 10
p1 = 0.2; p2 = 0.8
alpha = 10
beta = 5

mutations = rnegbinmix(n=3000, alpha=alpha, beta=beta, c=c(c1, c2), p = c(p1, p2))

# EM algorithm

# Constraints: ui %*% theta - ci >= 0
# 0 <= p1 <=1

ui = matrix(c(1, -1), ncol=1)
ci = c(0, -1)

# EM algorithm
EMres = EM_proportions(theta.null = 0.5, x.non = mutations, alpha = alpha, beta = beta, cvec=c(c1, c2), iter=25, epsilon = 10^(-8), ui = ui, ci=ci)
plot(EMres$theta, t="b")
plot(EMres$loglikelihood, t="b")


### Example with 3 components ####

c1 = 0.01; c2 = 1; c3 = 100
p1 = 0.33; p2 = 0.33; p3 = 0.34
alpha = 10
beta = 5

# Simulate mutations data (synonymous and non-synonymous)

mutations = rnegbinmix_syn_nonsyn(n=3000, alpha=alpha, beta=beta, c=c(c1, c2, c3), p = c(p1, p2, p3))

# Q

Qall = Q(theta = c(5, 1, 0.33, 0.33), theta.prime = c(5, 1, 0.33, 0.33), x.syn = mutations$Syn, x.non = mutations$Non, c = c(c1, c2, c3))
Qnon = Q_proportions(theta = c(0.33, 0.33), theta.prime = c(0.33, 0.33), x.non = mutations$Non, alpha=10, beta=5, c = c(c1, c2, c3))

# EM algorithm for the proportions

#' # Constraints: ui %*% theta - ci >= 0
# 0 <= p1, p2 <=1, p1 + p2 <= 1

ui = rbind(c(1, 0), c(0, 1),
 c(-1, 0), c( 0, -1), c(-1, -1))
ci = c(0, 0, -1, -1, -1)

EM_prop = EM_proportions(theta.null = c(0.33, 0.33), x.non = mutations$Non, alpha=alpha, beta = beta, cvec=c(c1, c2, c3), iter=25, epsilon = 10^(-8), ui = ui, ci=ci)
matplot(EM_prop$theta, t="b")
plot(EM_prop$loglikelihood, t="b")

# EM algorithm for all parameters

# Constraints: ui %*% theta - ci >= 0
# alpha > 0, beta > 0, 0 <= p1, p2 <=1, p1 + p2 <= 1

ui = rbind(c(1, 0, 0, 0), c(0, 1, 0, 0), c(0, 0, 1, 0), c(0, 0, 0, 1),
 c(0, 0, -1, 0), c(0, 0, 0, -1), c(0, 0, -1, -1))
ci = c(0, 0, 0, 0, -1, -1, -1)

EMres = EM(theta.null = c(5, 1, 0.33, 0.33), x.syn = mutations$Syn, x.non = mutations$Non, cvec=c(c1, c2, c3), iter=25, epsilon = 10^(-8), ui = ui, ci=ci)
matplot(EMres$theta, t="b")
plot(EMres$loglikelihood, t="b")

johannabertl/SelectionMix documentation built on May 3, 2019, 4:03 p.m.