mix.lrt: Estimation of a Mixture Complexity Based on Likelihood Ratio...

Description Usage Arguments Details Value See Also Examples

View source: R/5_lrt_utils.R

Description

Estimation of a mixture complexity as well as its component weights and parameters based on comparing the likelihood ratio test statistic (LRTS) to a bootstrapped quantile.

Usage

1
mix.lrt(obj, j.max = 10, B = 100, quantile = 0.95, control = c(trace = 0), ...)

Arguments

obj

object of class datMix.

j.max

integer, giving the maximal complexity to be considered.

B

integer, specifying the number of bootstrap replicates.

quantile

numeric between 0 and 1 specifying the bootstrap quantile to which the observed LRTS will be compared.

control

control list of optimization parameters, see solnp.

...

further arguments passed to the boot function.

Details

Define the complexity of a finite mixture F as the smallest integer p, such that its pdf/pmf f can be written as

f(x) = w_1*g(x;θ _1) + … + w_p*g(x;θ _p).

To estimate p, mix.lrt sequentially tests p = j versus p = j+1 for j = 1,2, …, by finding the maximum likelihood estimator (MLE) for the density of a mixture with j and j+1 components and calculating the corresponding likelihood ratio test statistic (LRTS). Next, a parametric bootstrap procedure is used to generate B samples of size n from a j-component mixture given the previously calculated MLE. For each of the bootstrap samples, the MLEs corresponding to densities of mixtures with j and j+1 components are calculated, as well as the LRTS. The null hypothesis H_0: p = j is rejected and j increased by 1 if the LRTS based on the original data is larger than the chosen quantile of its bootstrapped counterparts. Otherwise, j is returned as the complexity estimate. The MLEs are calculated via the MLE.function attribute (of the datMix object obj) for j = 1, if it is supplied. For all other j (and also for j = 1 in case MLE.function = NULL) the solver solnp is used to calculate the minimum of the negative log-likelihood. The initial values supplied to the solver are calculated as follows: the data is clustered into j groups by the function clara and the data corresponding to each group is given to MLE.function (if supplied to the datMix object, otherwise numerical optimization is used here as well). The size of the groups is taken as initial component weights and the MLE's are taken as initial component parameter estimates.

Value

Object of class paramEst with the following attributes:

dat

data based on which the complexity is estimated.

dist

character string stating the (abbreviated) name of the component distribution, such that the function ddist evaluates its density/ mass function and rdist generates random variates.

ndistparams

integer specifying the number of parameters identifying the component distribution, i.e. if θ is in R^d then ndistparams = d.

formals.dist

string vector, specifying the names of the formal arguments identifying the distribution dist and used in ddist and rdist, e.g. for a gaussian mixture (dist = norm) amounts to mean and sd, as these are the formal arguments used by dnorm and rnorm.

discrete

logical indicating whether the underlying mixture distribution is discrete.

mle.fct

attribute MLE.function of obj.

pars

Say the complexity estimate is equal to some j. Then pars is a numeric vector of size (d+1)*j-1 specifying the component weight and parameter estimates, given as

(w_1, ... w_{j-1}, θ 1_1, ... θ 1_j, θ 2_1, ... θ d_j).

values

numeric vector of function values gone through during optimization at iteration j, the last entry being the value at the optimum.

convergence

integer indicating whether the solver has converged (0) or not (1 or 2) at iteration j.

See Also

solnp for the solver, datMix for the creation of the datMix object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
### generating 'Mix' object
normLocMix <- Mix("norm", discrete = FALSE, w = c(0.3, 0.4, 0.3), mean = c(10, 13, 17),
                  sd = c(1, 1, 1))

### generating 'rMix' from 'Mix' object (with 1000 observations)
set.seed(0)
normLocRMix <- rMix(1000, normLocMix)


### generating 'datMix' from 'R' object

## generate list of parameter bounds

norm.bound.list <- list("mean" = c(-Inf, Inf), "sd" = c(0, Inf))

## generate MLE functions

# for "mean"
MLE.norm.mean <- function(dat) mean(dat)
# for "sd" (the sd function uses (n-1) as denominator)
MLE.norm.sd <- function(dat){
sqrt((length(dat) - 1) / length(dat)) * sd(dat)
}
# combining the functions to a list
MLE.norm.list <- list("MLE.norm.mean" = MLE.norm.mean,
                      "MLE.norm.sd" = MLE.norm.sd)

## generating 'datMix' object
normLoc.dM <- RtoDat(normLocRMix, theta.bound.list = norm.bound.list,
                     MLE.function = MLE.norm.list)

### complexity and parameter estimation

set.seed(0)
res <- mix.lrt(normLoc.dM, B = 30)
plot(res)

mixComp documentation built on Feb. 25, 2021, 5:07 p.m.