Thoughts/brain dump on interface design for mixed model fitting software, mostly thinking about R, but trying to draw on ideas and inspirations from other software (Stata, GENSTAT/AS-REML, SAS PROC MIXED ...) The basic problem is that there is a very large, complex set of models that are potentially feasible within a single class of computational machinery, but it's rather hard to design an interface that makes it possible to specify all possible models in a reasonably straightforward but extendable way.
Some of this overlaps with questions about the design of the flexLambda
branch of lme4
, and with Steve's lme4ord
package.
nlme
package uses the basic R formula interface, with separate
formula
and random
arguments for the response+fixed and
random terms respectively. formula
argument follows standard R formula specifications (Wilkinson-Rogers, see ?model.matrix
)random
argument is typically of the form x1+x2+...xn|g1/g2/g3
where the xi
specify covariates (or 1
) and the gi
specify nested grouping variables. (One can also specify a list with different models for each grouping variable, but the grouping variables must still be nested.)random
terms can in principle be specified by using the pdMat
classes, e.g. to construct crossed-random-effect models, but these are underused ...nlme
allows the use of
separate weights
and correlation
terms to specify models
for heteroscedasticity and residual correlation. Several CRAN packages offer additional correlation models that are useful for phylogenetic or geostatistical applications (fields
, ramps
, ape
...)library("nlme") library("ramps") library("ape") ff <- function(x) { s <- apropos(paste0("^",x,"[A-Z]"),ignore.case=FALSE) f <- gsub("package:","",sapply(s,find)) split(s,f) } ff("pd") ff("var") ff("cor")
MCMCglmm
uses a derivative (I think) of the AS-REML/GENSTAT specification.idv
(independent/constant variance), idh
(independent/heterogeneous variance), cor
(unstructured correlation matrix), us
(unstructured)lme4
uses a variant of nlme
's specification that (1) allows terms based on non-nested grouping variables; (2) puts the fixed and random-effect specificati ons into a single formula. Extensions such as allowing different residual variances or different variance-covariance matrices of random effects per (fixed-effect) group can be achieved, somewhat clunkily, by using the dummy()
helper function to construct an indicator variable to multiply by individual levels of interest. It also allows a special ||
operator to specify models with independent slopes and intercepts, but this really isn't as general as it should be; it semantically expands the covariate formula into separate (and hence independent) terms, rather than building a diagonal variance-covariance matrix; hence it doesn't work as (might be) expected for categorical covariatesglmmADMB
allows either nlme
-style (separate fixed & random) or lme4
-style (single formula) specification, but doesn't have any of the extra SAS
gives a wide variety of covariance/correlation structures in table 56.13 ... I think they're applicable to either R-side or G-side effects ...groupedData
objects in lme
do this, but for reasons I don't entirely understand they never really took off. Farther in the future we can imagine direct links to structures like Steve's multitable
package or other relational databasesnlme
-style var/cor/pd classes (especially cor
classes, some of which have a lot of cholesky-factor construction machinery built in)?? would be nice to have some reasonably juicy examples here and show how they're implemented in different models (e.g. something like an extended version of the table on the glmm.wikidot faq or of the table in my Fox et al. GLMM chapter (which compares model statements for four (G)LMMs in lme4
, MCMCglmm
, and glmmADMB
)
Current flexLambda
examples:
logDens~sample+dilut+cs(~(0+sample|Block),het=FALSE) f1 <- Reaction ~ Days + d(~(Days | Subject))
where cs()
stands for compound symmetry. The other currently implemented choices are d()
(diagonal) and ar1d
(AR order 1, 1-dimensional/temporal model). I'm not sure where the het
parameter is interpreted, but presumably it's to specify heterogeneous variances or not.
compound symmetry
heterogeneous residual variance across random-effect group levels
* heterogeneous residual variance across fixed factor levels
Some lme4ord
examples illustrate the basic idea of lme4ord
, but these aren't full runable examples. Here is a simple one that is runable:
## require(lme4ord) ## FIXME
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.