glmmGS: Fit Generalized Linear Mixed Models

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

glmmGS is used to fit generalized mixed linear models specified by a string-like formula. The MLE estimation of the parameters is performed by using a Gauss-Seidel algorithm to fit the regression coefficients (see Reference), and a Penalized Quasi Likelihood (PQL) approach to fit the covariance components (see Note). The algorithm is highly optimized to fit fixed effect models to data-sets with a large number of observations and with stratified covariates with a large number of levels, and mixed models with stratified covariates with a large number of levels and diagonal covariance structures. Covariance structures defined by non diagonal dense or sparse precision matrices are also allowed (see glmmGS.CovarianceModel).

Usage

1
	glmmGS(formula, family, data = NULL, covariance.models = NULL, control = glmmGS.Control())

Arguments

formula

a formula object providing a symbolic description of the model to be fitted. The formula object must be of the type response ~ linear-predictor. For a binomial response, the response must be defined as (outcome | counts). The linear predictor comprises fixed effects and random effects. Fixed and random effects must be grouped in blocks (the blocks of the Gauss-Seidel algorithm, see Reference) specified by pairs of parentheses. Blocks of covariates can be stratified by a grouping variable. The covariance model inside the random effect block is specified by using the symbol ~ followed by a variable-name of a previously defined glmmGS.CovarianceModel). If an offset off is present in the linear predictor, it must be specified with keyword offset(off) before the specification of the fixed and random effects. The intercepts should always be explicitly specified by the symbol 1; implicit specification of the intercept is not assumed. See Details for information on how non-identifiability introduced by multiple intercepts is handled. In order to minimize the memory usage, none of the variables present in the formula are copied or coerced. In particular: discrete outcomes must be integer vectors; covariates must be vectors and/or matrices of either integer or numeric (i.e., double) type; grouping variables used to specify stratified covariates must be integer vectors containing zero-based consecutive levels.

family

a description of the error distribution. Only two families are currently allowed: binomial and poisson with a canonical link.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which glmmGS is called.

covariance.models

an optional list or environment containing the covariance models referenced in the formula. If not found in covariance.models, the covariance models are taken from environment(formula), typically the environment from which glmmGS is called.

control

a list of parameters for controlling the fitting process. The list is returned by the glmmGS.Control function.

Details

If a fixed intercept is specified (either a global intercept or a stratified intercept), then each stratified random intercept is constrained to have a zero mean. If both global and stratified fixed intercepts are specified, then their fitted values are determined up to a random constant. This is because the Gauss-Seidel algorithm displays best convergence properties by updating each fixed intercept component-wise without imposing any constraints, such as setting the value of the intercept associated with the first level of a factor equal to zero. (Notice that the same does not hold for random intercepts, where removing the overall mean is necessary to boost converge.) A simple post-processing of the fitted value can be used to adjust the offset of the estimated fixed intercepts.

Value

In the current implementation (see Note), the function returns a list of values comprised of:

fixed.effects

a list containing the items: estimates: the estimates of the fixed effect coefficients; standard.errors: the standard errors of the fixed effect coefficient estimates (see Note).

random.effects

a list containing the items: estimates: the estimates of the random effect coefficients; standard.errors: the standard errors of the random effect coefficient estimates (see Note).

covariance.components

a list containing the items: estimates: the estimates of the covariance components; standard.errors: the standard errors of the covariance components.

iterations

the number of iterations.

Note

The distinguishing characteristic of the Gauss-Seidel algorithm is that the full precision matrix of the posterior distribution of the parameters is never computed. Only diagonal blocks of the precision matrix are computed, one for each pair of parentheses in the formula. Furthermore, each precision-block associated with stratified covariates, is subdivided in as many sub-blocks as many levels there are in the grouping factor. As a consequence, the standard errors associated with the coefficient estimates are computed for each block conditionally to the other blocks, and they tend to be smaller than the standard-errors calculated using the full posterior distribution of the parameters.

The glmmGS function is optimized to handle data-sets with a large number of observations and data stratifications with a large number of levels. To avoid unnecessary memory allocation, the return value contains only minimal fitting information. Auxiliary functions to calculate quantities such as block covariance matrices, fitted values, predictors, will be soon released. We strongly recommend not to commit to the current format of the return value, since it will likely change in the next release. A user should write ‘accessor’ functions to parse the return value, so that user-code using glmmGS would only need to change the ‘accessor’ functions in the next release.

Although it is possible to specify as many blocks of random effects as desired, the PQL update of the covariance components is performed one block at a time. This yields to different estimates compared to a joint PQL update of the covariance components. Alternative approaches, more compatible with the component-wise nature of the Gauss-Seidel algorithm, to the PQL approximation of the integration over the random effect coefficients, are under consideration.

Author(s)

Michele Morara, Louise Ryan, Subharup Guha, Christopher Paciorek

References

Guha, S., Ryan, L., Morara M. (2009) Gauss-Seidel Estimation of Generalized Linear Mixed Models With Application to Poisson Modeling of Spatially Varying Disease Rates. Journal of Computational and Graphical Statistics, 4, 810–837.

See Also

glmmGS.CovarianceModel, glmmGS.Control

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
	# Set data dimension: 
	# - 1 million observations, 
	# - 7 'dense' fixed effect covariates,
	# - stratified random effects with 2000 strata (levels)
	nobs <- 1000000;
	ndays <- 7;
	nPostCodes <- 2000;

	# Generate data
	counts <- as.integer(runif(nobs, 1, 10));
	days <- matrix(rnorm(nobs * ndays), nrow = nobs, ncol = ndays);
	seifa <- rnorm(nobs);
	postCodes <- as.factor(as.integer(runif(nobs, 0, nPostCodes)));
	ipostCodes <- as.integer(postCodes) - 1L; # zero-based vector of indices
	
	# Generate coefficients
	offset = runif(nobs, -1, 1);
	intercept = -0.5;
	sd.days <- 0.1;
	beta.days <- rnorm(ndays, sd = sd.days);
	sd.seifa <- 0.1;
	intercept.seifa <- rnorm(nPostCodes, sd = sd.seifa);
	beta.seifa <- rnorm(nPostCodes, sd = sd.seifa);
	
	# Generate linear predictor
	eta <- intercept + days %*% beta.days;
	eta <- eta + intercept.seifa[as.integer(postCodes)] + seifa * beta.seifa[as.integer(postCodes)];

	# Generate response
	y <- as.integer(rbinom(nobs, counts, plogis(eta)));

	# Define identity precision model and control list
	I <- glmmGS.CovarianceModel("identity");
	control <- glmmGS.Control(reltol = 1.e-8, abstol = 1.e-25, maxit = 200);
	
	# Fit model using Gauss-Seidel algorithm with two blocks: 
	# - one fixed effect block: (1 + days);
	# - one stratified random effect block with an identity  
	#   covariance model: (1 + seifa | ipostCodes ~ I).
	formula = (y | counts) ~ offset(offset) + (1 + days) + (1 + seifa | ipostCodes ~ I);
	results <- glmmGS(formula = formula, family = binomial, control = control);

glmmGS documentation built on Sept. 12, 2016, 12:07 p.m.

Related to glmmGS in glmmGS...