# 21_init.EM: Initialization and EM Algorithm In EMCluster: EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution

## Description

These functions perform initializations (including em.EM and RndEM) followed by the EM iterations for model-based clustering of finite mixture multivariate Gaussian distribution with unstructured dispersion in both of unsupervised and semi-supervised clusterings.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```init.EM(x, nclass = 1, lab = NULL, EMC = .EMC, stable.solution = TRUE, min.n = NULL, min.n.iter = 10, method = c("em.EM", "Rnd.EM")) em.EM(x, nclass = 1, lab = NULL, EMC = .EMC, stable.solution = TRUE, min.n = NULL, min.n.iter = 10) rand.EM(x, nclass = 1, lab = NULL, EMC = .EMC.Rnd, stable.solution = TRUE, min.n = NULL, min.n.iter = 10) exhaust.EM(x, nclass = 1, lab = NULL, EMC = .EMControl(short.iter = 1, short.eps = Inf), method = c("em.EM", "Rnd.EM"), stable.solution = TRUE, min.n = NULL, min.n.iter = 10); ```

## Arguments

 `x` the data matrix, dimension n * p. `nclass` the desired number of clusters, K. `lab` labeled data for semi-supervised clustering, length n. `EMC` the control for the EM iterations. `stable.solution` if returning a stable solution. `min.n` restriction for a stable solution, the minimum number of observations for every final clusters. `min.n.iter` restriction for a stable solution, the minimum number of iterations for trying a stable solution. `method` an initialization method.

## Details

The `init.EM` calls either `em.EM` if `method="em.EM"` or `rand.EM` if `method="Rnd.EM"`.

The `em.EM` has two steps: short-EM has loose convergent tolerance controlled by `.EMC\$short.eps` and try several random initializations controlled by `.EMC\$short.iter`, while long-EM starts from the best short-EM result (in terms of log likelihood) and run to convergence with a tight tolerance controlled by `.EMC\$EM.eps`.

The `rand.EM` also has two steps: first randomly pick several random initializations controlled by `.EMC\$short.iter`, and second starts from the best of the random result (in terms of log likelihood) and run to convergence.

The `lab` is only for the semi-supervised clustering, and it contains pre-labeled indices between 1 and K for labeled observations. Observations with index 0 is non-labeled and has to be clustered by the EM algorithm. Indices will be assigned by the results of the EM algorithm. See `demo(allinit_ss,'EMCluster')` for details.

The `exhaust.EM` also calls the `init.EM` with different `EMC` and perform `exhaust.iter` times of EM algorithm with different initials. The best result is returned.

## Value

These functions return an object `emobj` with class `emret` which can be used in post-process or other functions such as `e.step`, `m.step`, `assign.class`, `em.ic`, and `dmixmvn`.

## Author(s)

Wei-Chen Chen [email protected] and Ranjan Maitra.

## References

`emcluster`, `.EMControl`.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```## Not run: library(EMCluster, quietly = TRUE) set.seed(1234) x <- da1\$da ret.em <- init.EM(x, nclass = 10, method = "em.EM") ret.Rnd <- init.EM(x, nclass = 10, method = "Rnd.EM", EMC = .EMC.Rnd) emobj <- simple.init(x, nclass = 10) ret.init <- emcluster(x, emobj, assign.class = TRUE) par(mfrow = c(2, 2)) plotem(ret.em, x) plotem(ret.Rnd, x) plotem(ret.init, x) ## End(Not run) ```

### Example output

```Loading required package: MASS