Description Usage Arguments Details Value Note Author(s) References See Also Examples
glmmGS is used to fit generalized mixed linear models specified by a string-like formula.
The MLE estimation of the parameters is performed by using a Gauss-Seidel algorithm
to fit the regression coefficients (see Reference), and a Penalized Quasi Likelihood (PQL)
approach to fit the covariance components (see Note). The algorithm is highly optimized to fit
fixed effect models to data-sets with a large number of observations and
with stratified covariates with a large number of levels, and mixed models
with stratified covariates with a large number of levels and diagonal
covariance structures. Covariance structures defined by non diagonal dense
or sparse precision matrices are also allowed (see glmmGS.CovarianceModel
).
1 |
formula |
a formula object providing a symbolic description of the model to be fitted.
The formula object must be of the type response ~ linear-predictor. For a binomial
response, the response must be defined as |
family |
a description of the error distribution. Only two families are currently allowed:
|
data |
an optional data frame, list or environment (or object coercible by as.data.frame
to a data frame) containing the variables in the model.
If not found in |
covariance.models |
an optional list or environment containing the covariance models referenced
in the formula. If not found in |
control |
a list of parameters for controlling the fitting process. The list is returned
by the |
If a fixed intercept is specified (either a global intercept or a stratified intercept), then each stratified random intercept is constrained to have a zero mean. If both global and stratified fixed intercepts are specified, then their fitted values are determined up to a random constant. This is because the Gauss-Seidel algorithm displays best convergence properties by updating each fixed intercept component-wise without imposing any constraints, such as setting the value of the intercept associated with the first level of a factor equal to zero. (Notice that the same does not hold for random intercepts, where removing the overall mean is necessary to boost converge.) A simple post-processing of the fitted value can be used to adjust the offset of the estimated fixed intercepts.
In the current implementation (see Note), the function returns a list of values comprised of:
fixed.effects |
a list containing the items:
|
random.effects |
a list containing the items:
|
covariance.components |
a list containing the items:
|
iterations |
the number of iterations. |
The distinguishing characteristic of the Gauss-Seidel algorithm is that the full precision matrix of the posterior distribution of the parameters is never computed. Only diagonal blocks of the precision matrix are computed, one for each pair of parentheses in the formula. Furthermore, each precision-block associated with stratified covariates, is subdivided in as many sub-blocks as many levels there are in the grouping factor. As a consequence, the standard errors associated with the coefficient estimates are computed for each block conditionally to the other blocks, and they tend to be smaller than the standard-errors calculated using the full posterior distribution of the parameters.
The glmmGS
function is optimized to handle data-sets with a large number of observations
and data stratifications with a large number of levels.
To avoid unnecessary memory allocation, the return value contains only minimal fitting information.
Auxiliary functions to calculate quantities such as block covariance matrices,
fitted values, predictors, will be soon released. We strongly recommend not to commit
to the current format of the return value, since it will likely change in the next release.
A user should write ‘accessor’ functions to parse the return value, so that user-code
using glmmGS
would only need to change the ‘accessor’ functions in the next
release.
Although it is possible to specify as many blocks of random effects as desired, the PQL update of the covariance components is performed one block at a time. This yields to different estimates compared to a joint PQL update of the covariance components. Alternative approaches, more compatible with the component-wise nature of the Gauss-Seidel algorithm, to the PQL approximation of the integration over the random effect coefficients, are under consideration.
Michele Morara, Louise Ryan, Subharup Guha, Christopher Paciorek
Guha, S., Ryan, L., Morara M. (2009) Gauss-Seidel Estimation of Generalized Linear Mixed Models With Application to Poisson Modeling of Spatially Varying Disease Rates. Journal of Computational and Graphical Statistics, 4, 810–837.
glmmGS.CovarianceModel
, glmmGS.Control
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | # Set data dimension:
# - 1 million observations,
# - 7 'dense' fixed effect covariates,
# - stratified random effects with 2000 strata (levels)
nobs <- 1000000;
ndays <- 7;
nPostCodes <- 2000;
# Generate data
counts <- as.integer(runif(nobs, 1, 10));
days <- matrix(rnorm(nobs * ndays), nrow = nobs, ncol = ndays);
seifa <- rnorm(nobs);
postCodes <- as.factor(as.integer(runif(nobs, 0, nPostCodes)));
ipostCodes <- as.integer(postCodes) - 1L; # zero-based vector of indices
# Generate coefficients
offset = runif(nobs, -1, 1);
intercept = -0.5;
sd.days <- 0.1;
beta.days <- rnorm(ndays, sd = sd.days);
sd.seifa <- 0.1;
intercept.seifa <- rnorm(nPostCodes, sd = sd.seifa);
beta.seifa <- rnorm(nPostCodes, sd = sd.seifa);
# Generate linear predictor
eta <- intercept + days %*% beta.days;
eta <- eta + intercept.seifa[as.integer(postCodes)] + seifa * beta.seifa[as.integer(postCodes)];
# Generate response
y <- as.integer(rbinom(nobs, counts, plogis(eta)));
# Define identity precision model and control list
I <- glmmGS.CovarianceModel("identity");
control <- glmmGS.Control(reltol = 1.e-8, abstol = 1.e-25, maxit = 200);
# Fit model using Gauss-Seidel algorithm with two blocks:
# - one fixed effect block: (1 + days);
# - one stratified random effect block with an identity
# covariance model: (1 + seifa | ipostCodes ~ I).
formula = (y | counts) ~ offset(offset) + (1 + days) + (1 + seifa | ipostCodes ~ I);
results <- glmmGS(formula = formula, family = binomial, control = control);
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.