EM and MLE Estimation of Univariate Normal Mixtures
Description
These functions estimate the parameters of a univariate (finite) normal
mixture using the EM algorithm or Likelihood Maximimization via
optim(.., method = "BFGS")
.
Usage
1 2 3 4 5 
Arguments
x 
numeric: the data for which the parameters are to be estimated. 
m 
integer or factor: If 
name 
character, passed to 
sd.min 
number: the minimal value that the normal components'
standard deviations ( 
maxiter 
integer: maximum number of EM iterations. 
tol 
numeric: EM iterations stop if relative changes of the
loglikelihood are smaller than 
trace 
integer (or logical) specifying if the iterations should be
traced and how much output should be produced. The default,

Details
Estimation of univariate mixtures can be very sensitive to
initialization. By default, norMixEM
and norMixLME
cut
the data into m groups of approximately equal
size. See examples below for other initialization possibilities.
The EM algorithm consists in repeated application of E and M steps
until convergence. Mainly for didactical reasons, we also provide the
functions estep.nm
, mstep.nm
, and
emstep.nm
.
The MLE, Maximum Likelihood Estimator, maximizes the likelihood using
optim
, using the same advantageous parametrization as
llnorMix
.
Value
An object of class norMix
.
Author(s)
EM: Friedrich Leisch, originally; Martin Maechler vectorized it in
m, added trace
etc.
MLE: M.Maechler
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35  ex < norMix(mu = c(1,2,5), sig2 = c(1, 0.5, 3))
plot(ex, col="gray", p.norm=FALSE)
x < rnorMix(100, ex)
lines(density(x))
rug(x)
## EM estimation may fail depending on random sample
ex1 < norMixEM(x, 3, trace=2) #> warning (sometimes)
ex1
plot(ex1)
## initialization by cut() into intervals of equal length:
ex2 < norMixEM(x, cut(x, 3))
ex2
## initialization by kmeans():
k3 < kmeans(x, 3)$cluster
ex3 < norMixEM(x, k3)
ex3
## Now, MLE instead of EM:
exM < norMixMLE(x, k3, tol = 1e12, trace=4)
exM
## real data
data(faithful)
plot(density(faithful$waiting, bw = "SJ"), ylim=c(0,0.044))
rug(faithful$waiting)
(nmF < norMixEM(faithful$waiting, 2))
lines(nmF, col=2)
## are three components better?
nmF3 < norMixEM(faithful$waiting, 3, maxiter = 200)
lines(nmF3, col="forestgreen")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.