# mixfit: Fit of Mixture Densities to Samples In RBesT: R Bayesian Evidence Synthesis Tools

## Description

Expectation-Maximization (EM) based fitting of parametric mixture densities to numerical samples. This provides a convenient approach to approximate MCMC samples with a parametric mixture distribution.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```mixfit(sample, type = c("norm", "beta", "gamma"), thin, ...) ## Default S3 method: mixfit(sample, type = c("norm", "beta", "gamma"), thin, ...) ## S3 method for class 'gMAP' mixfit(sample, type, thin, ...) ## S3 method for class 'gMAPpred' mixfit(sample, type, thin, ...) ## S3 method for class 'array' mixfit(sample, type, thin, ...) ```

## Arguments

 `sample` Sample to be fitted. `type` Mixture density to use. Can be either norm, beta or gamma. `thin` Thinning applied to the sample. See description for default behavior. `...` Parameters passed to the low-level EM fitting functions. Parameter `Nc` is mandatory.

## Details

Parameters of EM fitting functions

Nc

Number of mixture components. Required parameter.

mix_init

Initial mixture density. If missing (default) then a k-nearest-neighbor algorithm is used to find an initial mixture density.

Ninit

Number of data points used for initialization. Defaults to 50.

verbose

If set to `TRUE` the function will inform about fitting process

maxIter

Maximal number of iterations. Defaults to 500.

tol

Defines a convergence criteria as an upper bound for the change in the log-likelihood, i.e. once the derivative (with respect to iterations) of the log-likelihood falls below `tol`, the function declares convergence and stops.

eps

Must be a triplet of numbers which set the desired accuracy of the inferred parameters per mixture component. See below for a description of the parameters used during EM. EM is stopped once a running mean of the absolute difference between the last successive `Neps` estimates is below the given `eps` for all parameters. Defaults to 5E-3 for each parameter.

Neps

Number of iterations used for the running mean of parameter estimates to test for convergence. Defaults to 5.

constrain_gt1

Logical value controlling if the Beta EM constrains all parameters a & b to be greater than 1. By default constraints are turned on (new since 1.6-0).

By default the EM convergence is declared when the desired accuracy of the parameters has been reached over the last `Neps` estimates. If `tol` and `Neps` is specified, then whatever criterion is met first will stop the EM.

The parameters per component k used internally during fitting are for the different EM procedures:

normal

logit(w_k), μ_k, \log(σ_k)

beta

logit(w_k), \log(a_k), \log(b_k)

constrained beta

logit(w_k), \log(a_k-1), \log(b_k-1)

gamma

logit(w_k), \log(α_k), \log(β_k)

Note: Whenever no `mix_init` argument is given, the EM fitting routines assume that the data vector is given in random order. If in the unlikely event that the EM gets caught in a local extremum, then random reordering of the data vector may alleviate the issue.

## Value

A mixture object according the requested `type` is returned. The object has additional information attached, i.e. the log-likelihood can be queried and diagnostic plots can be generated. See links below.

## Methods (by class)

• `default`: Performs an EM fit for the given sample. Thinning is applied only if thin is specified.

• `gMAP`: Fits the default predictive distribution from a gMAP analysis. Automatically obtains the predictive distribution of the intercept only case on the response scale mixture from the `gMAP` object. For the binomial case a beta mixture, for the gaussian case a normal mixture and for the Poisson case a gamma mixture will be used. In the gaussian case, the resulting normal mixture will set the reference scale to the estimated sigma in `gMAP` call.

• `gMAPpred`: Fits a mixture density for each prediction from the `gMAP` prediction.

• `array`: Fits a mixture density for an MCMC sample. It is recommended to provide a thinning argument which roughly yields independent draws (i.e. use `acf` to identify a thinning lag with small auto-correlation). The input array is expected to have 3 dimensions which are nested as iterations, chains, and draws.

## References

Dempster A.P., Laird N.M., Rubin D.B. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B 1977; 39 (1): 1-38.

Other EM: `plot.EM()`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```bmix <- mixbeta(rob=c(0.2, 1, 1), inf=c(0.8, 10, 2)) bsamp <- rmix(bmix, 1000) bfit <- mixfit(bsamp, type="beta", Nc=2) # diagnostic plots can easily by generated from the EM fit with bfit.check <- plot(bfit) names(bfit.check) # check convergence of parameters bfit.check\$mix bfit.check\$mixdens bfit.check\$mixecdf # obtain the log-likelihood logLik(bfit) # or AIC AIC(bfit) ```