norMixFit: EM and MLE Estimation of Univariate Normal Mixtures

norMixFitR Documentation

EM and MLE Estimation of Univariate Normal Mixtures

Description

These functions estimate the parameters of a univariate (finite) normal mixture using the EM algorithm or Likelihood Maximimization via optim(.., method = "BFGS").

Usage

norMixEM(x, m, name = NULL, sd.min = 1e-07* diff(range(x))/m,
         trafo = c("clr1", "logit"),
         maxiter = 100, tol = sqrt(.Machine$double.eps), trace = 1)

norMixMLE(x, m, name = NULL, 
          trafo = c("clr1", "logit"),
          maxiter = 100, tol = sqrt(.Machine$double.eps), trace = 2)

Arguments

x

numeric: the data for which the parameters are to be estimated.

m

integer or factor: If m has length 1 it specifies the number of mixture components, otherwise it is taken to be a vector of initial cluster assignments, see details below.

name

character, passed to norMix. The default, NULL, uses match.call().

sd.min

number: the minimal value that the normal components' standard deviations (sd) are allowed to take. A warning is printed if some of the final sd's are this boundary.

trafo

character string specifying the transformation of the component weight w m-vector (mathematical notation in norMix: \pi_j, j=1,\dots,m) to an (m-1)-dimensional unconstrained parameter vector in our parametrization. See nM2par for details.

maxiter

integer: maximum number of EM iterations.

tol

numeric: EM iterations stop if relative changes of the log-likelihood are smaller than tol.

trace

integer (or logical) specifying if the iterations should be traced and how much output should be produced. The default, 1 prints a final one line summary, where trace = 2 produces one line of output per iteration.

Details

Estimation of univariate mixtures can be very sensitive to initialization. By default, norMixEM and norMixLME cut the data into m groups of approximately equal size. See examples below for other initialization possibilities.

The EM algorithm consists in repeated application of E- and M- steps until convergence. Mainly for didactical reasons, we also provide the functions estep.nm, mstep.nm, and emstep.nm.

The MLE, Maximum Likelihood Estimator, maximizes the likelihood using optim, using the same advantageous parametrization as llnorMix.

Value

An object of class norMix.

Author(s)

EM: Friedrich Leisch, originally; Martin Maechler vectorized it in m, added trace etc.

MLE: M.Maechler

Examples

## use (mu, sigma)
ex  <- norMix(mu = c(-1,2,5), sigma = c(1, 1/sqrt(2), sqrt(3)))
tools::assertWarning(verbose=TRUE,
           ## *deprecated* (using 'sig2' will *NOT* work in future!)
           ex. <- norMix(mu = c(-1,2,5), sig2 = c(1, 0.5, 3))
       )
stopifnot(all.equal(ex, ex.))
plot(ex, col="gray", p.norm=FALSE)

x <- rnorMix(100, ex)
lines(density(x))
rug(x)

## EM estimation may fail depending on random sample
ex1 <- norMixEM(x, 3, trace=2) #-> warning (sometimes)
ex1
plot(ex1)

## initialization by cut() into intervals of equal length:
ex2 <- norMixEM(x, cut(x, 3))
ex2

## initialization by kmeans():
k3 <- kmeans(x, 3)$cluster
ex3 <- norMixEM(x, k3)
ex3

## Now, MLE instead of EM:
exM <- norMixMLE(x, k3, tol = 1e-12, trace=4)
exM

## real data
data(faithful)
plot(density(faithful$waiting, bw = "SJ"), ylim=c(0,0.044))
rug(faithful$waiting)

(nmF <- norMixEM(faithful$waiting, 2))
lines(nmF, col=2)
## are three components better?
nmF3 <- norMixEM(faithful$waiting, 3, maxiter = 200)
lines(nmF3, col="forestgreen")

nor1mix documentation built on May 29, 2024, 8:28 a.m.