datMix: Constructor for Objects for Which to Estimate the Mixture...

Description Usage Arguments Details Value See Also Examples

View source: R/0_Mix_utils.R

Description

Function to generate objects of class datMix to be passed to other mixComp functions used for estimating mixture complexity.

Usage

1
2
3
4
5
6
7
datMix(dat, dist, discrete = NULL, theta.bound.list = NULL,
       MLE.function = NULL, Hankel.method = NULL, Hankel.function = NULL)

is.datMix(x)

## S3 method for class 'datMix'
print(x, ...)

Arguments

dat

numeric vector containing observations from the mixture model.

dist

character string providing the (abbreviated) name of the component distribution, such that the function ddist evaluates its density function and rdist generates random numbers. The function sources functions for the density/mass estimation and random variate generation from distributions in distributions, so the abbreviations should be specified accordingly. Thus to create a gaussian mixture, set dist = "norm", for a poisson mixture, set dist = "pois". The MixComp functions will find the functions dnorm, rnorm and dpois, rpois respectively.

discrete

logical flag indicating whether the mixture distribution is discrete, required for methods that estimate component weights and parameters.

theta.bound.list

named list specifying the upper and lower bounds for the component parameters. The names of the list elements have to match the names of the formal arguments of the functions ddist and rdist exactly as specified in the distributions in distributions. For a gaussian mixture, the list elements would have to be named mean and sd, as these are the formal arguments used by rnorm and dnorm. Has to be supplied if a method that estimates the component weights and parameters is to be used.

MLE.function

function (or a list of functions) which takes the data as input and outputs the maximum likelihood estimator for the parameter(s) the component distribution dist. If the component distribution has more than one parameter, a list of functions has to be supplied and the order of the MLE functions has to match the order of the component parameters in theta.bound.list (e.g. for a normal mixture, if the first entry of theta.bound.list is the bounds of the mean, then then first entry of MLE.function has to be the MLE of the mean). If this argument is supplied and the datMix object is handed over to a complexity estimation procedure relying on optimizing over a likelihood function, the MLE.function attribute will be used for the single component case. In case the objective function is neither a likelihood nor corresponds to a mixture with more than 1 component, numerical optimization will be used based on Rsolnp's function solnp, but MLE.function will be used to calculate the initial values passed to solnp. Specifying MLE.function is optional. If not supplied, for example because the MLE solution does not exist in a closed form, numerical optimization is used to find the relevant MLE.

Hankel.method

character string in c("explicit", "translation", "scale"), specifying the method of estimating the moments of the mixing distribution used to calculate the relevant Hankel matrix. Has to be specified when using nonparamHankel, paramHankel or paramHankel.scaled. For further details see below.

Hankel.function

function required for the moment estimation via Hankel.method. This normally depends on Hankel.method as well as dist. For further details see below.

x
in is.datMix():

returns TRUE if the argument is a datMix object and FALSE otherwise.

in print.datMix():

object of class datMix.

...

further arguments passed to the print method.

Details

If the datMix object is supposed to be passed to a function that calculates the Hankel matrix of the moments of the mixing distribution (i.e. nonparamHankel, paramHankel or paramHankel.scaled), the arguments Hankel.method and Hankel.function have to be specified. The Hankel.methods that can be used to generate the estimate of the (raw) moments of the mixing distribution and the corresponding Hankel.functions are the following, where j specifies an estimate of the number of components:

"explicit"

For this method, Hankel.function contains a function with arguments called dat and j, explicitly estimating the moments of the mixing distribution from the data and assumed mixture complexity at current iteration. Note that what Dacunha-Castelle & Gassiat (1997) called the "natural" estimator in their paper is equivalent to using "explicit" with Hankel.function

f_j((1/n) * ∑_i(ψ_j(X_i))).

"translation"

This method corresponds to Dacunha-Castelle & Gassiat's (1997) example 3.1. It is applicable if the family of component distributions (G_θ) is given by

dG_θ(x) = dG(x-θ),

where G is a known probability distribution, such that its moments can be expressed explicitly. Hankel.function contains a function of j returning the jth (raw) moment of G.

"scale"

This method corresponds to Dacunha-Castelle & Gassiat's (1997) example 3.2. It is applicable if the family of component distributions (G_θ) is given by

dG_θ(x) = dG(x / θ),

where G is a known probability distribution, such that its moments can be expressed explicitly. Hankel.function contains a function of j returning the jth (raw) moment of G.

If the datMix object is supposed to be passed to a function that estimates the component weights and parameters (i.e. all but nonparamHankel), the arguments discrete and theta.bound.list have to be specified, and MLE.function will be used in the estimation process if it is supplied (otherwise the MLE is found numerically).

Value

Object of class datMix with the following attributes (for further explanations see above):

dist

character string giving the abbreviated name of the component distribution, such that the function ddist evaluates its density/mass and rdist generates random variates.

discrete

logical flag indicating whether the mixture distribution is discrete.

theta.bound.list

named list specifying the upper and lower bounds for the component parameters.

MLE.function

function which computes the MLE of the component distribution dist.

Hankel.method

character string taking on values "explicit", "translation", or "scale", specifying the method of estimating the moments of the mixing distribution to compute the corresponding Hankel matrix.

Hankel.function

function required for the moment estimation via Hankel.method. See details for more information.

See Also

RtoDat for conversion of rMix to datMix objects.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## observations from a (presumed) mixture model
obs <- faithful$waiting

## generate list of parameter bounds (assuming gaussian components)
norm.bound.list <- list("mean" = c(-Inf, Inf), "sd" = c(0, Inf))

## generate MLE functions
# for "mean"
MLE.norm.mean <- function(dat) mean(dat)
# for "sd" (the sd function uses (n-1) as denominator)
MLE.norm.sd <- function(dat){
  sqrt((length(dat) - 1) / length(dat)) * sd(dat)
}
# combining the functions to a list
MLE.norm.list <- list("MLE.norm.mean" = MLE.norm.mean,
                      "MLE.norm.sd" = MLE.norm.sd)

## function giving the j^th raw moment of the standard normal distribution,
## needed for calculation of the Hankel matrix via the "translation" method
## (assuming gaussian components with variance 1)

mom.std.norm <- function(j){
  ifelse(j %% 2 == 0, prod(seq(1, j - 1, by = 2)), 0)
}

## generate 'datMix' object
faithful.dM <- datMix(obs, dist = "norm", discrete = FALSE,
                      theta.bound.list = norm.bound.list, MLE.function = MLE.norm.list,
                      Hankel.method = "translation", Hankel.function = mom.std.norm)

## using 'datMix' object to estimate the mixture complexity
set.seed(1)
res <- paramHankel.scaled(faithful.dM)
plot(res)

mixComp documentation built on Feb. 25, 2021, 5:07 p.m.