good-practice | R Documentation |
Base fitting functions in R will seek variables in the environment where the formula
was defined (i.e., typically in the global environment), if they are not in the data
. This increases the memory size of fit objects (as the formula and attached environment are part of such objects). This also easily leads to errors (see example in the discussion of update.HLfit
). Indeed Chambers (2008, p.221), after describing how the environment is defined, comments that “Where clear and trustworthy software is a priority, I would personally avoid such tricks. Ideally, all the variables in the model frame should come from an explicit, verifiable data source...”. Fitting functions in spaMM try to adhere to such a principle, as they assume by default that all variables from the formula
should be in the data
argument (and then, one never needs to specify “data$
” in the formula
.. spaMM implements this by default by stripping the formula environment from any variable. It is also possible to assign a given environment to the formula, through the control control.HLfit$formula_env
: see Examples.
The variables defining the prior.weights
should also be in the data
. However, the implementation of the prior.weights
argument has limitations that can be overcome by using the more recently introduced weights.formula
argument of spaMM fitting functions (see Examples, where this is also compared with stats::lm
's handling of its weights
argument).
However, variables used in other arguments such as ranFix
are looked up neither in the data nor in the formula environment, but in the calling environment as usual.
Chambers J.M. (2008) Software for data analysis: Programming with R. Springer-Verlag New York
####### Controlling the formula environment
set.seed(123)
d2 <- data.frame(y = seq(10)/2+rnorm(5)[gl(5,2)], x1 = sample(10), grp=gl(5,2), seq10=seq(10))
# Using only variables in the data: basic usage
# HLfit(y ~ x1 + seq10+(1|grp), data = d2)
# is practically equivalent to
HLfit(y ~ x1 + seq10+(1|grp), data = d2,
control.HLfit = list(formula_env=list2env(list(data=d2))))
#
# The 'formula_env' avoids the need for the 'seq10' variable:
HLfit(y ~ x1 + I(seq_len(nrow(data)))+(1|grp), data = d2,
control.HLfit = list(formula_env=list2env(list(data=d2))))
#
# Internal imprementation exploits partial matching of argument names
# so that this can also be in 'control' if 'control.HLfit' is absent:
fitme(y ~ x1 + I(seq_len(nrow(data)))+(1|grp), data = d2,
control = list(formula_env=list2env(list(data=d2))))
####### Prior-weights misery
data("hills", package="MASS")
(fit <- lm(time ~ dist + climb, data = hills, weights=1/dist^2))
# same as
(fit <- fitme(time ~ dist + climb, data = hills, prior.weights=1/dist^2, method="REML"))
# possible calls:
(fit <- fitme(time ~ dist + climb, data = hills, prior.weights=quote(1/dist^2)))
(fit <- fitme(time ~ dist + climb, data = hills, prior.weights= 1/hills$dist^2))
(fit <- fitme(time ~ dist + climb, data = hills, weights.form= ~ 1/dist^2))
(fit <- fitme(time ~ dist + climb, data = hills, weights.form= ~ I(1/dist^2)))
# Also syntactically correct since 'dist' is found in the data:
(fit <- fitme(time ~ dist + climb, data = hills, weights.form= ~ rep(2,length(dist))))
#### Programming with prior weights:
## Different ways of passing prior weights to fitme() from another function:
wrap_as_form <- function(weights.form) {
fitme(time ~ dist + climb, data = hills, weights.form=weights.form)
}
wrap_as_pw <- function(prior.weights) {
fitme(time ~ dist + climb, data = hills, prior.weights=prior.weights)
}
wrap_as_dots <- function(...) {
fitme(time ~ dist + climb, data = hills,...)
}
## Similarly for lm:
wrap_lm_as_dots <- function(...) {
lm(time ~ dist + climb, data = hills, ...)
}
wrap_lm_as_arg <- function(weights) {
lm(time ~ dist + climb, data = hills,weights=weights)
}
## Programming errors with stats::lm():
pw <- rep(1e-6,35) # or even NULL
(fit <- wrap_lm_as_arg(weights=pw)) # catches weights from global envir!
(fit <- lm(time ~ dist + climb, data = hills, weights=pw)) # idem!
(fit <- lm(time ~ dist + climb, data = hills,
weights=hills$pw)) # fails silently - no $pw in 'hills'
(fit <- wrap_lm_as_dots(weights=hills$pw)) # idem!
(fit <- wrap_lm_as_arg(weights=hills$pw)) # idem!
## Safer spaMM results:
try(fit <- wrap_as_pw(prior.weights= pw)) # correctly catches problem
try(fit <- wrap_as_dots(prior.weights=hills$pw)) # correctly catches problem
(fit <- wrap_as_dots(prior.weights=1/dist^2)) # correct
(fit <- wrap_as_dots(prior.weights=quote(1/dist^2))) # correct
## But 'prior.weights' limitations:
try(fit <- wrap_as_pw(prior.weights= 1/hills$dist^2)) # fails (stop)
try(fit <- wrap_as_pw(prior.weights= 1/dist^2)) # fails (stop)
try(fit <- wrap_as_pw(prior.weights= quote(1/dist^2))) # fails (stop)
## Problems all solved by using 'weights.form':
try(fit <- wrap_as_form(weights.form= ~ pw)) # correctly catches problem
(fit <- wrap_as_form(weights.form= ~1/dist^2)) # correct
(fit <- wrap_as_form(weights.form= ~1/hills$dist^2)) # correct
(fit <- wrap_as_dots(weights.form= ~ 1/dist^2)) # correct
rm("pw")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.