mix.lrt: Estimation of a Mixture Complexity Based on Likelihood Ratio...
In mixComp: Estimation of Order of Mixture Distributions

Description Usage Arguments Details Value See Also Examples

Estimation of a mixture complexity as well as its component weights and parameters based on comparing the likelihood ratio test statistic (LRTS) to a bootstrapped quantile.

1	mix.lrt(obj, j.max = 10, B = 100, quantile = 0.95, control = c(trace = 0), ...)

`obj`	object of class `datMix`.
`j.max`	integer, giving the maximal complexity to be considered.
`B`	integer, specifying the number of bootstrap replicates.
`quantile`	numeric between 0 and 1 specifying the bootstrap quantile to which the observed LRTS will be compared.
`control`	control list of optimization parameters, see `solnp`.
`...`	further arguments passed to the `boot` function.

Define the complexity of a finite mixture F as the smallest integer p, such that its pdf/pmf f can be written as

f(x) = w_1*g(x;θ _1) + … + w_p*g(x;θ _p).

To estimate p, mix.lrt sequentially tests p = j versus p = j+1 for j = 1,2, …, by finding the maximum likelihood estimator (MLE) for the density of a mixture with j and j+1 components and calculating the corresponding likelihood ratio test statistic (LRTS). Next, a parametric bootstrap procedure is used to generate B samples of size n from a j-component mixture given the previously calculated MLE. For each of the bootstrap samples, the MLEs corresponding to densities of mixtures with j and j+1 components are calculated, as well as the LRTS. The null hypothesis H_0: p = j is rejected and j increased by 1 if the LRTS based on the original data is larger than the chosen quantile of its bootstrapped counterparts. Otherwise, j is returned as the complexity estimate. The MLEs are calculated via the MLE.function attribute (of the datMix object obj) for j = 1, if it is supplied. For all other j (and also for j = 1 in case MLE.function = NULL) the solver solnp is used to calculate the minimum of the negative log-likelihood. The initial values supplied to the solver are calculated as follows: the data is clustered into j groups by the function clara and the data corresponding to each group is given to MLE.function (if supplied to the datMix object, otherwise numerical optimization is used here as well). The size of the groups is taken as initial component weights and the MLE's are taken as initial component parameter estimates.

Object of class paramEst with the following attributes:

`dat`	data based on which the complexity is estimated.
`dist`	character string stating the (abbreviated) name of the component distribution, such that the function `ddist` evaluates its density/ mass function and `rdist` generates random variates.
`ndistparams`	integer specifying the number of parameters identifying the component distribution, i.e. if θ is in R^d then `ndistparams` = d.
`formals.dist`	string vector, specifying the names of the formal arguments identifying the distribution `dist` and used in `ddist` and `rdist`, e.g. for a gaussian mixture (`dist = norm`) amounts to `mean` and `sd`, as these are the formal arguments used by `dnorm` and `rnorm`.
`discrete`	logical indicating whether the underlying mixture distribution is discrete.
`mle.fct`	attribute `MLE.function` of `obj`.
`pars`	Say the complexity estimate is equal to some j. Then `pars` is a numeric vector of size (d+1)j-1* specifying the component weight and parameter estimates, given as (w_1, ... w_{j-1}, θ 1_1, ... θ 1_j, θ 2_1, ... θ d_j).
`values`	numeric vector of function values gone through during optimization at iteration j, the last entry being the value at the optimum.
`convergence`	integer indicating whether the solver has converged (0) or not (1 or 2) at iteration j.

solnp for the solver, datMix for the creation of the datMix object.

### generating 'Mix' object
normLocMix <- Mix("norm", discrete = FALSE, w = c(0.3, 0.4, 0.3), mean = c(10, 13, 17),
                  sd = c(1, 1, 1))

### generating 'rMix' from 'Mix' object (with 1000 observations)
set.seed(0)
normLocRMix <- rMix(1000, normLocMix)


### generating 'datMix' from 'R' object

## generate list of parameter bounds

norm.bound.list <- list("mean" = c(-Inf, Inf), "sd" = c(0, Inf))

## generate MLE functions

# for "mean"
MLE.norm.mean <- function(dat) mean(dat)
# for "sd" (the sd function uses (n-1) as denominator)
MLE.norm.sd <- function(dat){
sqrt((length(dat) - 1) / length(dat)) * sd(dat)
}
# combining the functions to a list
MLE.norm.list <- list("MLE.norm.mean" = MLE.norm.mean,
                      "MLE.norm.sd" = MLE.norm.sd)

## generating 'datMix' object
normLoc.dM <- RtoDat(normLocRMix, theta.bound.list = norm.bound.list,
                     MLE.function = MLE.norm.list)

### complexity and parameter estimation

set.seed(0)
res <- mix.lrt(normLoc.dM, B = 30)
plot(res)