EM_smartstart: EM algorithm for the negative binomial mixture model with...

Description Usage Arguments Details See Also Examples

Description

EM algorithm to estimate the parameters alpha, beta, p1, ..., pk-1 of a negative binomial mixture model with k components.

Usage

1
2
EM_smartstart(theta.null, x.syn, x.non, cvec, iter1, iter2, epsilon1 = NULL,
  epsilon2 = NULL, ui, ci, full.output = F)

Arguments

theta.null

Starting values for the parameters p1, ..., pk-1 in EM1.

x.syn

Vector of synonymous mutation counts per gene.

x.non

Vector of non-synonymous mutation counts per gene.

cvec

Parameters c1, ..., ck.

iter1

Maximum number of iterations for EM1.

iter2

Maximum number of iterations for EM2.

epsilon1

Stopping criterion for EM1. If NULL, the maximum number of iterations is used.

epsilon2

Stopping criterion for EM2. If NULL, the maximum number of iterations is used.

ui

Constraints on the parameters. See details.

ci

Constraints on the parameters. See details.

full.output

logical. If T, the results of the ML estimation and EM1 are output. Otherwise, only the output of EM2 is output (in the same format as by the function EM).

Details

The parameters alpha and beta are estimated on the synonymous mutations by ML. Then, the parameters p1, ..., pk-1 are estimated by an EM algorithm on the non-synonymous mutations using the estimated alpha and beta as known parameters (EM1). Finally, alpha, beta and p1, ..., pk-1 are used as starting values in another EM algorithm, where all of them are re-estimated (EM2).

For the estimation of alpha and beta on the synonymous mutations, the function MASS::fitdistr is used.

In the M step of both EM algorithms, constrained optimization is conducted with the function constrOptim. ui and ci are defined accordingly (see constrOptim): ui %*% theta - ci >= 0. It is important that ui and ci are specified such that p is a probability vector, see examples. It is not allowed to specify constraints that affect (alpha and beta) and (p1, ..., pk-1) jointly.

As a stopping criterion, the difference in log-likelihood can be used. To make sure that there is always at least a short trajectory for visual checks, the criterion is only used after at least 5 iterations.

The function EM_smartstart_apply can be used to run the algorithm on a list of datasets (e. g. using mclapply for parallel execution).

See Also

MASS::fitdistr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
### Example with 2 components ###

# Simulate dataset of synonymous and non-synonymous mutations

c1 = 0.5; c2 = 10
p1 = 0.2; p2 = 0.8
alpha = 10
beta = 5

mutations = rnegbinmix_syn_nonsyn(n=3000, alpha=alpha, beta=beta, c=c(c1, c2), p = c(p1, p2))

# EM algorithm

# Constraints: ui %*% theta - ci >= 0
# alpha > 0, beta > 0, 0 <= p1 <=1

ui = rbind(c(1, 0, 0), c(0, 1, 0), c(0, 0, 1), c(0, 0, -1))
ci = c(0, 0, 0, -1)

# EM algorithm
EMtest = EM_smartstart(theta.null = 0.5, x.syn = mutations$Syn, x.non = mutations$Non, cvec=c(c1, c2), iter1=10, iter2=10, epsilon2 = 10^(-8), ui = ui, ci = ci)
matplot(EMtest$theta, t="b")
plot(EMtest$loglikelihood, t="b")



### Example with 4 components ###

c1 = 0.1; c2 = 1; c3 = 10; c4 = 100;
cvec = c(c1, c2, c3, c4)
p1 = 0.2; p2 = 0.6; p3 = 0.1; p4 = 0.1;
pvec = c(p1, p2, p3, p4)
alpha = 1.72396
beta = 89.019

mutations = rnegbinmix_syn_nonsyn(n=3000, alpha=alpha, beta=beta, c=cvec, p = pvec)


# EM algorithm

# Constraints: ui %*% theta - ci >= 0
# alpha > 0, beta > 0, 0 <= p1 <=1

ui = rbind(diag(1, 5),
     cbind(rep(0, 3), rep(0, 3), diag(-1, 3)),
     c(0, 0, rep(-1, 3)))

ci = c(rep(0, 5), rep(-1, 4))

# EM algorithm
EMtest = EM_smartstart(theta.null = c(0.25, 0.25, 0.25), x.syn = mutations$Syn, x.non = mutations$Non, cvec=cvec, iter1=10, iter2=50, epsilon2 = 10^(-8), ui = ui, ci=ci, full.output=T)
matplot(EMtest$EM_final$theta, t="b")
plot(EMtest$EM_final$loglikelihood, t="b")

johannabertl/SelectionMix documentation built on May 3, 2019, 4:03 p.m.