Evaluation of data and formulae in lme4

Trying to decipher/debug evaluation problems in lme4.

There are many possible ways to specify the formula, and the data arguments, that can cause problems when eval() is run later while trying to do downstream analysis with the models (update, drop1, etc.).

Data

Formulae

Outlook/solutions

We had this mostly sorted out in the previous version/main branch, but we have now broken it somewhat, at least in part because we nested formula evaluation one level deeper than previously.

There are lots of tests in inst/tests/test-formulaEval.R. In particular, we have the test cases

If we needed to we could forbid cases 4 and 5 (i.e. say that the formula argument must be a formula), but the other three cases should definitely work. In principle, all five of these should also work whether or not a data argument is specified; again, if necessary we could insist on a data argument ...

The basic procedure is that within checkFormulaData (which is typically called from [g]lFormula, which is typically called from [g]lmer), we try to evaluate (1) whether there is a data argument, (2) whether the formula argument has an environment or not. If the former, we make the data into an environment and return it to [g]lFormula, to be assigned as the environment of the formula. If the latter, we return the formula's environment, to be (re)assigned as the formula environment. If neither of these is true, we hope for the best and return parent.frame(2L) as the appropriate environment. This works because we have called [g]lFormula with env=parent.frame(1L): therefore since we are in the call stack checkFormulaData < [g]lFormula < [g]lmer, parent.frame(2L) actually goes up to the calling environment of glmer ...

Calling auxiliary functions such as update or drop1 generally makes things harder, as we may have left important information behind. In particular, there is one case that fails. drop1 and update try to re-evaluate the function call. The problem here is that we need to re-evaluate in an environment that contains the data argument -- not just an environment that contains the components of the data.

Right now, things mostly work anyway -- because we are generally looking in the right place for the data. It fails if we both specify the formula as a string and specify the data argument. This is because we (1) set the formula environment as the data and (2) re-evaluate within the formula environment; but the formula environment doesn't contain the data argument itself, just the contents of the data argument.

library("lme4")
set.seed(101)
n <- 20
x <- rbinom(n, 1, 1/2)
y <- rnorm(n)
z <- rnorm(n)
r <- sample(1:5, size=n, replace=TRUE)
d <- data.frame(x,y,z,r)
F <- "z"
rF <- "(1|r)"
modStr <- (paste("x ~", "y +", F, "+", rF))
modForm <- as.formula(modStr)
m_nodata.0 <- glmer( x ~ y + z + (1|r) , family="binomial")
m_nodata.1 <- glmer( as.formula(modStr) , family="binomial")
m_nodata.2 <- glmer( modForm , family="binomial")
m_nodata.3 <- glmer( modStr , family="binomial")
m_nodata.4 <- glmer( "x ~ y + z + (1|r)" , family="binomial")
fnames <- c("formula","coerced_stored_string","stored_formula","stored_string","string")
m_nodata_List <- setNames(list(m_nodata.0,m_nodata.1,m_nodata.2,m_nodata.3,m_nodata.4),fnames)
m_nodata_try <- lapply(m_nodata_List,function(x) try(drop1(x),silent=TRUE))
m_nodata_results <- sapply(m_nodata_try,inherits,"try-error")
m_nodata_msg <- sapply(m_nodata_try,function(x) 
   if (!inherits(x,"try-error")) NA else attr(x,"condition")$message)
## data argument specified
m_data.0 <- glmer( x ~ y + z + (1|r) , data=d, family="binomial")
m_data.1 <- glmer( as.formula(modStr) , data=d, family="binomial")
m_data.2 <- glmer( modForm , data=d, family="binomial")
m_data.3 <- glmer( modStr , data=d, family="binomial")
m_data.4 <- glmer( "x ~ y + z + (1|r)" , data=d, family="binomial")


lme4/lme4 documentation built on June 9, 2024, 7:16 a.m.