EM: EM algorithm for the negative binomial mixture model

Description Usage Arguments Details Value Author(s) See Also Examples

Description

EM algorithm to estimate the parameters of a negative binomial mixture model with k components.

Usage

1
EM(theta.null, x.syn, x.non, cvec, iter, epsilon = NULL, ui, ci)

Arguments

theta.null

Starting value, vector consisting of alpha, beta, p1, ..., pk-1.

x.syn

Vector of synonymous mutation counts per gene.

x.non

Vector of non-synonymous mutation counts per gene.

cvec

Parameter vector c1, ..., ck.

iter

Maximum number of iterations.

epsilon

Log-likelihood difference stopping criterion. If NULL, the algorithm is run for iter iterations. Otherwise, it is run until the increase in the log-likelihood is smaller than epsilon, but at most for iter iterations.

ui

Linear constraints for theta. See details and examples.

ci

Linear constraints for theta. See details and examples.

Details

The parameters alpha, beta and p1, ..., pk-1 are estimated. pk = 1 - p1 - ... -pk-1. c1, ..., ck are fixed.

In the M step, constrained optimization is conducted with the function constrOptim. ui and ci are defined accordingly (see constrOptim): ui %*% theta - ci >= 0. It is important that ui and ci are specified such that p is a probability vector, see examples.

As a stopping criterion, the difference in log-likelihood can be used. To make sure that there is always at least a short trajectory for visual checks, the criterion is only used after at least 5 iterations.

Value

theta

Matrix with the trajectories of the parameter values.

Q

Vector with the trajectory of the Q values.

loglikelihood

Vector with the trajectory of the log likelihood values.

convergence

Vector with the convergence code for each iteration of the M-step of the algorithm, see stats::optim (0 indicates successful completion).

Author(s)

Johanna Bertl

See Also

rnegbinmix::EM_random for a wrapper function with randomly drawn starting values.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
### Example with 2 components ###

# Simulate dataset of synonymous and non-synonymous mutations

c1 = 0.5; c2 = 10
p1 = 0.2; p2 = 0.8
alpha = 10
beta = 5

mutations = rnegbinmix_syn_nonsyn(n=3000, alpha=alpha, beta=beta, c=c(c1, c2), p = c(p1, p2))

# function Q
Qtest = Q(theta = c(5, 1, 0.5), theta.prime = c(5, 1, 0.5), x.syn = mutations$Syn, x.non = mutations$Non, cvec = c(c1, c2))

# EM algorithm

# Constraints: ui %*% theta - ci >= 0
# alpha > 0, beta > 0, 0 <= p1 <=1

ui = rbind(c(1, 0, 0), c(0, 1, 0), c(0, 0, 1), c(0, 0, -1))
ci = c(0, 0, 0, -1)

# EM algorithm
EMres = EM(theta.null = c(5, 1, 0.5), x.syn = mutations$Syn, x.non = mutations$Non, cvec=c(c1, c2), iter=25, epsilon = 10^(-8), ui = ui, ci=ci)
matplot(EMres$theta, t="b")
plot(EMres$loglikelihood, t="b")


### Example with 2 components and non-integer values ###
# Usually, the mutation counts are integer values. If they have been corrected, e. g. for gene length, they might not be integers.

# EM algorithm
EMres_nonint = EM(theta.null = c(5, 1, 0.5), x.syn = mutations$Syn + 0.25, x.non = mutations$Non + 0.25, cvec=c(c1, c2), iter=25, epsilon = 10^(-8), ui = ui, ci=ci)
matplot(EMres_nonint$theta, t="b")
plot(EMres_nonint$loglikelihood, t="b")


### Example with 3 components ####

c1 = 0.01; c2 = 1; c3 = 100
p1 = 0.33; p2 = 0.33; p3 = 0.34
alpha = 10
beta = 5

mutations = rnegbinmix_syn_nonsyn(n=3000, alpha=alpha, beta=beta, c=c(c1, c2, c3), p = c(p1, p2, p3))

# Q

Qtest = Q(theta = c(5, 1, 0.33, 0.33), theta.prime = c(5, 1, 0.33, 0.33), x.syn = mutations$Syn, x.non = mutations$Non, c = c(c1, c2, c3))

# EM algorithm

# Constraints: ui %*% theta - ci >= 0
# alpha > 0, beta > 0, 0 <= p1, p2 <=1, p1 + p2 <= 1

ui = rbind(c(1, 0, 0, 0), c(0, 1, 0, 0), c(0, 0, 1, 0), c(0, 0, 0, 1),
 c(0, 0, -1, 0), c(0, 0, 0, -1), c(0, 0, -1, -1))
ci = c(0, 0, 0, 0, -1, -1, -1)

EMres = EM(theta.null = c(5, 1, 0.33, 0.33), x.syn = mutations$Syn, x.non = mutations$Non, cvec=c(c1, c2, c3), iter=25, epsilon = 10^(-8), ui = ui, ci=ci)
matplot(EMres$theta, t="b")
plot(EMres$loglikelihood, t="b")

johannabertl/SelectionMix documentation built on May 3, 2019, 4:03 p.m.