datMix: Create Object for Which to Estimate the Mixture Complexity

Description Usage Arguments Details Value See Also Examples

View source: R/0_Mix_utils.R

Description

Function to generate a datMix object to be passed to other mixComp functions used for estimating the mixture complexity.

Usage

1
2
3
4
5
6
7
datMix(dat, dist, theta.bound.list = NULL, MLE.function = NULL, 
       Hankel.method = NULL, Hankel.function = NULL)

is.datMix(x)
       
## S3 method for class 'datMix'
print(x, ...)

Arguments

dat

a numeric vector containing the observations from the mixture model.

dist

a character string giving the (abbreviated) name of the component distribution, such that the function ddist evaluates its density function and rdist generates random numbers. For example, to create a gaussian mixture, dist has to be specified as norm instead of normal, gaussian etc. for the package to find the functions dnorm and rnorm.

theta.bound.list

a named list specifying the upper and the lower bound for the component parameters. The names of the list elements have to match the names of the formal arguments of the functions ddist and rdist exactly. For a gaussian mixture, the list elements would have to be named mean and sd, as these are the formal arguments used by rnorm and dnorm. Has to be supplied if a method that estimates the component weights and parameters is to be used.

MLE.function

function (or list of functions) which takes as input the data and gives as output the maximum likelihood estimator for the parameter(s) of a one component mixture (i.e. the standard MLE of the component distribution dist). If the component distribution has more than one parameter, a list of functions has to be supplied and the order of the MLE functions has to match the order of the component parameters in theta.bound.list (e.g. for a normal mixture, if the first entry of theta.bound.list is the bounds of the mean, then then first entry of MLE.function has to be the MLE of the mean). If this argument is supplied and the datMix object is handed over to a complexity estimation procedure relying on optimizing over a likelihood function, the MLE.function attribute will be used for the single component case. In case the objective function is either not a likelihood or corresponds to a mixture with more than 1 components, numerical optimization will be used based on Rsolnp's function solnp, but MLE.function will be used to calculate the initial values passed to solnp. Specifying MLE.function is optional and if it is not, for example because the MLE solution does not exists in closed form, numerical optimization is used to find the relevant MLE's.

Hankel.method

character string in c("explicit", "translation", "scale"), specifying the method of estimating the moments of the mixing distribution used to calculate the relevant Hankel matrix. Has to be specified when using nonparamHankel, paramHankel or paramHankel.scaled. For further details see below.

Hankel.function

function needed for the moment estimation via Hankel.method. This normally depends on Hankel.method as well as dist. For further details see below.

x
in is.datMix():

R object.

in print.datMix():

object of class datMix.

...

further arguments passed to the print method.

Details

If the datMix object is supposed to be passed to a function that calculates the Hankel matrix of the moments of the mixing distribution (i.e. nonparamHankel, paramHankel or paramHankel.scaled), the arguments Hankel.method and Hankel.function have to be specified. The Hankel.methods that can be used to generate the estimate of the (raw) moments of the mixing distribution and the corresponding Hankel.functions are the following, where j specifies an estimate of the number of components:

"explicit"

For this method, Hankel.function contains a function with arguments called dat and j, explicitly estimating the moments of the mixing distribution from the data and the currently assumed mixture complexity. Note that what Dacunha-Castelle & Gassiat (1997) called the "natural" estimator in their original paper is equivalent to using "explicit" with Hankel.function f_j((1/n) * sum_i(ψ_j(X_i))).

"translation"

This method corresponds to Dacunha-Castelle & Gassiat's (1997) example 3.1. It is applicable if the family of component distributions (G_θ) is given by dG_θ(x) = dG(x-θ), where G is a known probability distribution whose moments can be given explicitly. Hankel.function contains a function of j returning the jth (raw) moment of G.

"scale"

This method corresponds to Dacunha-Castelle & Gassiat's (1997) example 3.2. It is applicable if the family of component distributions (G_θ) is given by dG_θ(x) = dG(x\θ), where G is a known probability distribution whose moments can be given explicitly. Hankel.function contains a function of j returning the jth (raw) moment of G.

If the datMix object is supposed to be passed to a function that estimates the component weights and parameters (i.e. all but nonparamHankel), the argument theta.bound.list has to be specified, and MLE.function will be used in the estimation process if it is supplied (otherwise the MLE is found numerically).

Note that the datMix function will change the random number generator (RNG) state.

Value

An object of class datMix with the following attributes (for further explanations see above):

dist
discrete

logical indicating whether the underlying mixture distribution is discrete.

theta.bound.list
MLE.function
Hankel.method
Hankel.function

See Also

RtoDat for the conversion of rMix to datMix objects.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
## observations from a (presumed) mixture model
obs <- faithful$waiting

## generate list of parameter bounds (assuming gaussian components)
norm.bound.list <- vector(mode = "list", length = 2)
names(norm.bound.list) <- c("mean", "sd")
norm.bound.list$mean <- c(-Inf, Inf)
norm.bound.list$sd <- c(0, Inf)

## generate MLE functions
# for "mean"
MLE.norm.mean <- function(dat) mean(dat)
# for "sd" (the sd function uses (n-1) as denominator)
MLE.norm.sd <- function(dat){
sqrt((length(dat) - 1) / length(dat)) * sd(dat)
} 
# combining the functions to a list
MLE.norm.list <- list("MLE.norm.mean" = MLE.norm.mean,
                      "MLE.norm.sd" = MLE.norm.sd)

## function giving the j^th raw moment of the standard normal distribution,
## needed for calculation of the Hankel matrix via the "translation" method
## (assuming gaussian components with variance 1)

mom.std.norm <- function(j){
  ifelse(j %% 2 == 0, prod(seq(1, j - 1, by = 2)), 0)
}

        
## generate 'datMix' object
faithful.dM <- datMix(obs, dist = "norm", theta.bound.list = norm.bound.list,
                      MLE.function = MLE.norm.list, Hankel.method = "translation",
                      Hankel.function = mom.std.norm)
                      
## using 'datMix' object to estimate the mixture complexity
set.seed(1)
res <- paramHankel.scaled(faithful.dM)
plot(res)

anjaweigel/mixComp_package documentation built on Sept. 2, 2020, 3:55 p.m.