# nonparam.Hankel: Estimate Mixture Complexity Based on Hankel Matrix In anjaweigel/mixComp_package: Estimation of the Order of Finite Mixture Distributions

## Description

Estimation of a mixture's complexity based on estimating the determinant of the Hankel matrix of the moments of the mixing distribution. The estimated determinants can be scaled and/or penalized.

## Usage

 ```1 2 3 4 5 6 7 8 9``` ```nonparamHankel(obj, j.max = 10, pen.function = NULL, scaled = FALSE, B = 1000, ...) ## S3 method for class 'hankDet' print(x, ...) ## S3 method for class 'hankDet' plot(x, type = "b", xlab = "j", ylab = NULL, mar = NULL, ylim = c(min(0, min(obj)), max(obj)), ...) ```

## Arguments

 `obj` object of class `datMix`. `j.max` integer specifying the maximal number of components to be considered. `pen.function` a function with arguments `j` and `n` specifying the penalty added to the determinant value given sample size n and the currently assumed complexity j. If left empty no penalty will be added. If non-empty and `scaled` is `TRUE`, the penalty function will be added after the determinants are scaled. `scaled` logical specifying whether the vector of estimated determinants should be scaled. `B` integer specifying the number of bootstrap replicates used for scaling of the determinants. Ignored if `scaled` is `FALSE`. `x` object of class `hankDet`. `type` character denoting type of plot, see, e.g. `lines`. Defaults to `"b"`. `xlab,ylab` labels for the x and y axis with defaults (the default for `ylab` is created within the function, if no value is supplied). `mar` numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot, see `par`. `ylim` range of y values to use. `...` in `nonparamHankel()`:further arguments passed to the `boot` function if `scaled` is `TRUE`. in `plot.hankDet()`:further arguments passed to `plot`. in `print.hankDet()`:further arguments passed to `print`.

## Details

Define the complexity of a finite mixture F as the smallest integer p, such that its pdf/pmf f can be written as

f(x) = w_1*g(x;θ _1) + … + w_p*g(x;θ _p).

`nonparamHankel` estimates p by iteratively increasing the assumed complexity j and calculating the determinant of the (j+1)x(j+1) Hankel matrix made up of the first 2j raw moments of the mixing distribution. As shown by Dacunha-Castelle & Gassiat (1997), once the correct complexity is reached (i.e. for all j >= p), this determinant is zero.

This suggests an estimation procedure for p based on initially finding a consistent estimator of the moments of the mixing distribution and then choosing the estimator estim_p as the value of j which yields a sufficiently small value of the determinant. Since the estimated determinant is close to 0 for all j >= p, this could lead to choosing estim_p rather larger than the true value. The function therefore returns all estimated determinant values corresponding to complexities up to `j.max`, so that the user can pick the lowest j generating a sufficiently small determinant. In addition, the function allows the inclusion of a penalty term as a function of the sample size `n` and the currently assumed complexity `j` which will be added to the determinant value (by supplying `pen.function`), and/or scaling of the determinants (by setting `scaled = TRUE`). For scaling, a nonparametric bootstrap is used to calculate the covariance of the estimated determinants, with `B` being the size of the bootstrap sample. The inverse of the square root of this covariance matrix (i.e. the matrix S^(-1) such that \$A = SS\$, where A is the covariance matrix) is then multiplied with the estimated determinant vector to get the scaled determinant vector.

For a thorough discussion of the methods that can be used for the estimation of the moments see the details section of `datMix`.

## Value

The vector of estimated determinants (optionally scaled and/or penalized), given back as an object of class `hankDet` with the following attributes:

 `scaled` logical indicating whether the determinants are scaled. `pen ` logical indicating whether a penalty was added to the determinants. `dist ` character string stating the (abbreviated) name of the component distribution, such that the function `ddist` evaluates its density function and `rdist` generates random numbers.

## References

D. Dacunha-Castelle and E. Gassiat, "The estimation of the order of a mixture model", Bernoulli, Volume 3, Number 3, 279-299, 1997.

`paramHankel` for a similar approach which estimates the component weights and parameters on top of the complexity, `datMix` for the creation of the `datMix` object.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29``` ```## create 'Mix' object geomMix <- Mix("geom", w = c(0.1, 0.6, 0.3), prob = c(0.8, 0.2, 0.4)) ## create random data based on 'Mix' object (gives back 'rMix' object) set.seed(1) geomRMix <- rMix(1000, obj = geomMix) ## create 'datMix' object for estimation # explicit function giving the estimate for the j^th moment of the # mixing distribution, needed for Hankel.method "explicit" explicit.fct.geom <- function(dat, j){ 1 - ecdf(dat)(j - 1) } ## generating 'datMix' object geom.dM <- RtoDat(geomRMix, Hankel.method = "explicit", Hankel.function = explicit.fct.geom) ## function for penalization pen <- function(j, n){ (j*log(n))/(sqrt(n)) } ## estimate determinants set.seed(1) geomdets_pen <- nonparamHankel(geom.dM, pen.function = pen, j.max = 5) plot(geomdets_pen, main = "Three component geometric mixture") ```