nonparamHankel: Estimation of Mixture Complexity Based on Hankel Matrix

Description Usage Arguments Details Value References See Also Examples

View source: R/1_hankel_np_utils.R

Description

Estimation of mixture complexity based on estimating the determinant of the Hankel matrix of the moments of the mixing distribution. The estimated determinants can be scaled and/or penalized.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
nonparamHankel(obj, j.max = 10, pen.function = NULL, scaled = FALSE, B = 1000, ...)

## S3 method for class 'hankDet'
print(x, ...)

## S3 method for class 'hankDet'
plot(
  x,
  type = "b",
  xlab = "j",
  ylab = NULL,
  mar = NULL,
  ylim = c(min(0, min(obj)), max(obj)),
  ...
)

Arguments

obj

object of class datMix.

j.max

integer specifying the maximal number of components to be considered.

pen.function

function with arguments j and n specifying the penalty added to the determinant value in the objective function, given sample size n and the assumed complexity at current iteration j. If left empty, no penalty will be added. If non-empty and scaled is TRUE, the penalty function will be added after the determinants are scaled.

scaled

logical flag specifying whether the vector of estimated determinants should be scaled.

B

integer specifying the number of bootstrap replicates used for scaling of the determinants. Ignored if scaled is FALSE.

...
in nonparamHankel():

further arguments passed to the boot function if scaled is TRUE.

in plot.hankDet():

further arguments passed to plot.

in print.hankDet():

further arguments passed to print.

x

object of class hankDet.

type

character denoting type of plot, see, e.g. lines. Defaults to "b".

xlab, ylab

labels for the x and y axis with defaults (the default for ylab is created within the function, if no value is supplied).

mar

numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot, see par.

ylim

range of y values to use.

Details

Define the complexity of a finite mixture F as the smallest integer p, such that its pdf/pmf f can be written as

f(x) = w_1*g(x;θ _1) + … + w_p*g(x;θ _p).

nonparamHankel estimates p by iteratively increasing the assumed complexity j and calculating the determinant of the (j+1) x (j+1) Hankel matrix made up of the first 2j raw moments of the mixing distribution. As shown by Dacunha-Castelle & Gassiat (1997), once the correct complexity is reached (i.e. for all j >= p), this determinant is zero. This suggests an estimation procedure for p based on initially finding a consistent estimator of the moments of the mixing distribution and then choosing the estimator estim_p as the value of j which yields a sufficiently small value of the determinant. Since the estimated determinant is close to 0 for all j >= p, this could lead to choosing estim_p rather larger than the true value. The function therefore returns all estimated determinant values corresponding to complexities up to j.max, so that the user can pick the lowest j generating a sufficiently small determinant. In addition, the function allows the inclusion of a penalty term as a function of the sample size n and the currently assumed complexity j which will be added to the determinant value (by supplying pen.function), and/or scaling of the determinants (by setting scaled = TRUE). For scaling, a nonparametric bootstrap is used to calculate the covariance of the estimated determinants, with B being the size of the bootstrap sample. The inverse of the square root of this covariance matrix (i.e. the matrix S^{(-1)} such that A = SS (see sqrtm), where A is the covariance matrix) is then multiplied with the estimated determinant vector to get the scaled determinant vector. Note that in the case of the scaled version the penalty function chosen should be multiplied by √{n} before it is entered as pen.function: let S* denote a j_m x j_m covariance matrix of the determinants calculated for the bth bootstrap sample (b=1,...,B and j=1,...,j_m). Then S* goes to S/n as B,n go to infinity. Write

S*^{-1/2}=√{n}*\hat{S}^{-1/2}.

Define the rescaled vector

(y_1,...,y_{j_m})^T = √{n}*\hat{S}^{-1/2}(\hat{d}_1,...,\hat{d}_{j_m})^T.

Then the creterion to be minimized becomes

|y_j| + pen.function*√{n}.

See further sections for examples. For a thorough discussion of the methods that can be used for the estimation of the moments see the details section of datMix.

Value

Vector of estimated determinants (optionally scaled and/or penalized) as an object of class hankDet with the following attributes:

scaled

logical flag indicating whether the determinants are scaled.

pen

logical flag indicating whether a penalty was added to the determinants.

dist

character string stating the (abbreviated) name of the component distribution, such that the function ddist evaluates its density function and rdist generates random numbers.

References

D. Dacunha-Castelle and E. Gassiat, "The estimation of the order of a mixture model", Bernoulli, Volume 3, Number 3, 279-299, 1997.

See Also

paramHankel for a similar approach which additionally estimates the component weights and parameters, datMix for construction of a datMix object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
## create 'Mix' object
geomMix <- Mix("geom", discrete = TRUE, w = c(0.1, 0.6, 0.3), prob = c(0.8, 0.2, 0.4))

## create random data based on 'Mix' object (gives back 'rMix' object)
set.seed(1)
geomRMix <- rMix(1000, obj = geomMix)

## create 'datMix' object for estimation

# explicit function giving the estimate for the j^th moment of the
# mixing distribution, needed for Hankel.method "explicit"

explicit.fct.geom <- function(dat, j){
 1 - ecdf(dat)(j - 1)
}

## generating 'datMix' object
geom.dM <- RtoDat(geomRMix, Hankel.method = "explicit",
                 Hankel.function = explicit.fct.geom)

## function for penalization w/o scaling
pen <- function(j, n){
  (j*log(n))/(sqrt(n))
}

## estimate determinants w/o scaling
set.seed(1)
geomdets_pen <- nonparamHankel(geom.dM, pen.function = pen, j.max = 5,
                               scaled = FALSE)
plot(geomdets_pen, main = "Three component geometric mixture")


## function for penalization with scaling
pen <- function(j, n){
  j*log(n)
}

## estimate determinants using the same penalty with scaling
geomdets_pen <- nonparamHankel(geom.dM, pen.function = pen, j.max = 5,
                               scaled = TRUE)
plot(geomdets_pen, main = "Three component geometric mixture")

mixComp documentation built on Feb. 25, 2021, 5:07 p.m.