Grid search over tol for NPML estimation of (generalized) random effect models

Share:

Description

Performs a grid search to select the parameter tol, which is a tuning parameter for starting point selection of the EM algorithm for NPML estimation (see e.g. Aitkin, Hinde & Francis, 2009, p. 437)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
tolfind(formula, 
        random = ~1, 
        family = gaussian(), 
        data, 
        k = 4, 
        random.distribution="np",
        offset, 
        weights, 
        na.action, 
        EMmaxit = 500, 
        EMdev.change = 0.001, 
        lambda = 0, 
        damp = TRUE, 
        damp.power = 1, 
        spike.protect = 1, 
        sdev,
        shape, 
        plot.opt = 1, 
        steps = 15, 
        find.in.range = c(0.05, 0.8), 
        verbose = FALSE, 
        noformat = FALSE,
        ...)

Arguments

formula

a formula defining the response and the fixed effects (e.g. y ~ x).

random

a formula defining the random model. Set random=~1 to model overdispersion.

family

conditional distribution of responses: "gaussian", "poisson", "binomial", "Gamma", or "inverse.gaussian" can be set.

data

the data frame (mandatory, even if it is attached to the workspace!).

k

the number of mass points/integration points (supported are up to 600 mass points).

random.distribution

the mixing distribution, Gaussian Quadrature (gq) or NPML (np) can be set.

offset

an optional offset to be included in the model.

weights

optional prior weights for the data.

na.action

a function indicating what should happen when NA's occur, with possible arguments na.omit and na.fail. The default is set by the na.action setting in options() .

EMmaxit

maximum number of EM iterations.

EMdev.change

stops EM algorithm when deviance change falls below this value.

lambda

see the help file for alldist.

damp

switches EM damping on or off.

damp.power

steers degree of damping.

spike.protect

see the help file for alldist. For unequal or smooth component dispersion parameters, the setting spike.protect=1 is strongly recommended.

sdev

optional fixed standard deviation for normal mixture.

shape

optional fixed shape parameter for Gamma and IG mixtures.

plot.opt

For plot.opt=1 the EM trajectories are plotted, for plot.opt=2 the development of the disparity -2logL over iteration number is plotted, for plot.opt=3 both plots are shown, and for plot.opt=0 none of them.

steps

number of grid points for the search of tol.

find.in.range

range for the search of tol.

verbose

If set to FALSE, no printed output is given during execution of alldist or allvc.

noformat

If TRUE, then any formatting of the plots is omitted.

...

further arguments which will be ignored.

Details

The EM algorithm for NPML estimation (Aitkin, 1996) uses the Gauss-Hermite masses and mass points as starting points. The position of the starting points can be concentrated or extended by setting tol smaller or larger than 1, respectively. The tuning parameter tol is, as in GLIM4, responsible for this scaling. A careful selection of tol may be necessary for some data sets. The reason is that NPML has a tendency to get stuck in local maxima, as the log-likelihhod function is not concave for fixed k (Boehning, 1999).

For Gaussian, Gamma, and IG mixtures this R implementation uses by default a damping procedure in the first cycles of the EM algorithm (Einbeck & Hinde, 2006), which stabilizes the algorithm and makes it less sensitive to the optimal choice of tol. Application of tolfind to check that the optimal solution has not been overlooked may nevertheless be advisable.

tolfind works for alldist and allvc. The tolfind function is mainly designed for NPML (random.distribution="np"). It can also be applied to Gaussian Quadrature (random.distribution="gq"), though tol is of little importance for this and primarily influences the speed of convergence.

Value

A list of 5 items:

MinDisparity

the minimal disparity achieved (for which EM converged).

Mintol

the tol value at which this disparity is achieved.

AllDisparities

a vector containing all disparities calculated on the grid.

Alltol

all corresponding tol values making up the grid.

AllEMconverged

a vector of Booleans indicating if EM converged for the particular tol values.

Author(s)

Jochen Einbeck & John Hinde (2006).

References

Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing 6 , 251-262.

Aitkin, M., Francis, B. and Hinde, J. (2009). Statistical Modelling in R. Oxford Statistical Science Series, Oxford, UK.

Boehning, D. (1999). Computer-Assisted Analysis of Mixtures and Applications. Meta-Analysis, Disease Mapping and others. Chapman & Hall / CRC, Boca Raton, FL, USA.

Einbeck, J. & Hinde, J. (2006). A note on NPML estimation for exponential family regression models with unspecified dispersion parameter. Austrian Journal of Statistics 35, 233-243.

See Also

alldist, allvc

Examples

1
2
3
4
5
6
  data(galaxies, package="MASS")
  gal<-as.data.frame(galaxies)
  tolfind(galaxies/1000~1, random=~1, k=5, data=gal, lambda=1, damp=TRUE, 
      find.in.range=c(0,1), steps=10) 
  # Minimal Disparity: 380.1444 at tol= 0.5