# mix.lrt: Estimate a Mixture's Complexity Based on Likelihood Ratio... In anjaweigel/mixComp_package: Estimation of the Order of Finite Mixture Distributions

## Description

Estimation of a mixture's the complexity as well as its component weights and parameters based on comparing the likelihood ratio test statistic (LRTS) to a bootstrapped quantile.

## Usage

 ```1 2``` ```mix.lrt(obj, j.max = 10, B = 100, quantile = 0.95, control = c(trace = 0), ...) ```

## Arguments

 `obj` object of class `datMix`. `j.max` integer giving the maximal complexity to be considered. `B` integer specifying the number of bootstrap replicates. `quantile` numeric between 0 and 1 specifying the bootstrap quantile to which the observed LRTS will be compared. `control` control list of optimization parameters, see `solnp`. `...` further arguments passed to the `boot` function.

## Details

Define the complexity of a finite mixture F as the smallest integer p, such that its pdf/pmf f can be written as

f(x) = w_1*g(x;θ _1) + … + w_p*g(x;θ _p).

To estimate p, `mix.lrt` sequentially tests p = j versus p = j+1 for j = 1,2, …, by finding the maximum likelihood estimator (MLE) for the density of a mixture with j and j+1 components and calculating the corresponding likelihood ratio test statistic (LRTS). Next, a parametric bootstrap procedure is used to generate `B` samples of size n from a j-component mixture given the previously calculated MLE. For each of the bootstrap samples, the MLEs corresponding to densities of mixtures with j and j+1 components are calculated, as well as the LRTS. The null hypothesis H_0: p = j is rejected and j increased by 1 if the LRTS based on the original data is larger than the chosen `quantile` of its bootstrapped counterparts. Otherwise, j is returned as the complexity estimate.

The MLEs are calculated via the `MLE.function` attribute (of the `datMix` object `obj`) for j = 1, if it is supplied. For all other j (and also for j = 1 in case `MLE.function = NULL`) the solver `solnp` is used to calculate the minimum of the negative log likelihood. The initial values supplied to the solver are calculated as follows: the data is clustered into j groups by the function `clara` and the data corresponding to each group is given to `MLE.function` (if supplied to the `datMix` object, otherwise numerical optimization is used here as well). The size of the groups is taken as initial component weights and the MLE's are taken as initial component parameter estimates.

## Value

Object of class `paramEst` with the following attributes:

 `dat` data based on which the complexity is estimated. `dist` character string stating the (abbreviated) name of the component distribution, such that the function `ddist` evaluates its density function and `rdist` generates random numbers. `ndistparams` integer specifying the number of parameters identifying the component distribution, i.e. if θ \subseteq R^d then `ndistparams` = d. `formals.dist` string vector specifying the names of the formal arguments identifying the distribution `dist` and used in `ddist` and `rdist`, e.g. for a gaussian mixture (`dist = norm`) amounts to `mean` and `sd`, as these are the formal arguments used by `dnorm` and `rnorm`. `discrete` logical indicating whether the underlying mixture distribution is discrete. `mle.fct` attribute `MLE.function` of `obj`. `pars` Say the complexity estimate is equal to some j. Then `pars` is a numeric vector of size (d+1)*j-1 specifying the component weight and parameter estimates, given as (w_1, ... w_{j-1}, θ 1_1, ... θ 1_j, θ 2_1, ... θ d_j). `values` numeric vector of function values gone through during optimization at iteration j, the last entry being the value at the optimum. `convergence` indicates whether the solver has converged (0) or not (1 or 2) at iteration j.

`solnp` for the solver, `datMix` for the creation of the `datMix` object.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41``` ```### generating 'Mix' object normLocMix <- Mix("norm", w = c(0.3, 0.4, 0.3), mean = c(10, 13, 17), sd = c(1, 1, 1)) ### generating 'rMix' from 'Mix' object (with 1000 observations) set.seed(0) normLocRMix <- rMix(1000, normLocMix) ### generating 'datMix' from 'R' object ## generate list of parameter bounds norm.bound.list <- vector(mode = "list", length = 2) names(norm.bound.list) <- c("mean", "sd") norm.bound.list\$mean <- c(-Inf, Inf) norm.bound.list\$sd <- c(0, Inf) ## generate MLE functions # for "mean" MLE.norm.mean <- function(dat) mean(dat) # for "sd" (the sd function uses (n-1) as denominator) MLE.norm.sd <- function(dat){ sqrt((length(dat) - 1) / length(dat)) * sd(dat) } # combining the functions to a list MLE.norm.list <- list("MLE.norm.mean" = MLE.norm.mean, "MLE.norm.sd" = MLE.norm.sd) ## generating 'datMix' object normLoc.dM <- RtoDat(normLocRMix, theta.bound.list = norm.bound.list, MLE.function = MLE.norm.list) ### complexity and parameter estimation ## Not run: set.seed(0) res <- mix.lrt(normLoc.dM, B = 30) plot(res) ## End(Not run) ```