predictInterval | R Documentation |
This function provides a way to capture model uncertainty in
predictions from multi-level models fit with lme4
. By drawing a sampling
distribution for the random and the fixed effects and then estimating the fitted
value across that distribution, it is possible to generate a prediction interval
for fitted values that includes all variation in the model except for variation
in the covariance parameters, theta. This is a much faster alternative than
bootstrapping for models fit to medium to large datasets.
predictInterval(
merMod,
newdata,
which = c("full", "fixed", "random", "all"),
level = 0.8,
n.sims = 1000,
stat = c("median", "mean"),
type = c("linear.prediction", "probability"),
include.resid.var = TRUE,
returnSims = FALSE,
seed = NULL,
.parallel = FALSE,
.paropts = NULL,
fix.intercept.variance = FALSE,
ignore.fixed.terms = NULL
)
merMod |
a merMod object from lme4 |
newdata |
a data.frame of new data to predict |
which |
a character specifying what to return, by default it returns the
full interval, but you can also select to return only the fixed variation or
the random component variation. If full is selected the resulting data.frame
will be |
level |
the width of the prediction interval |
n.sims |
number of simulation samples to construct |
stat |
take the median or mean of simulated intervals |
type |
type of prediction to develop |
include.resid.var |
logical, include or exclude the residual variance for linear models |
returnSims |
logical, should all n.sims simulations be returned? |
seed |
numeric, optional argument to set seed for simulations |
.parallel |
logical should parallel computation be used, default is FALSE |
.paropts |
-NOT USED: Caused issue #54- a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing. |
fix.intercept.variance |
logical; should the variance of the intercept term be adjusted downwards to roughly correct for its covariance with the random effects, as if all the random effects are intercept effects? |
ignore.fixed.terms |
a numeric or string vector of indexes or names of fixed effects which should be considered as fully known (zero variance). This can result in under-conservative intervals, but for models with random effects nested inside fixed effects, holding the fixed effects constant intervals may give intervals with closer to nominal coverage than the over-conservative intervals without this option, which ignore negative correlation between the outer (fixed) and inner (random) coefficients. |
To generate a prediction interval, the function first computes a simulated
distribution of all of the parameters in the model. For the random, or grouping,
effects, this is done by sampling from a multivariate normal distribution which
is defined by the BLUP estimate provided by ranef
and the associated
variance-covariance matrix for each observed level of each grouping terms. For
each grouping term, an array is build that has as many rows as there are levels
of the grouping factor, as many columns as there are predictors at that level
(e.g. an intercept and slope), and is stacked as high as there are number of
simulations. These arrays are then multiplied by the new data provided to the
function to produce a matrix of yhat values. The result is a matrix of the simulated
values of the linear predictor for each observation for each simulation. Each
grouping term has such a matrix for each observation. These values can be added
to get the estimate of the fitted value for the random effect terms, and this
can then be added to a matrix of simulated values for the fixed effect level to
come up with n.sims
number of possible yhat values for each observation.
The distribution of simulated values is cut according to the interval requested
by the function. The median or mean value as well as the upper and lower bounds
are then returned. These can be presented either on the linear predictor scale
or on the response scale using the link function in the merMod
.
a data.frame with three columns:
fit
The center of the distribution of predicted values as defined by
the stat
parameter.
lwr
The lower prediction interval bound corresponding to the quantile cut
defined in level
.
upr
The upper prediction interval bound corresponding to the quantile cut
defined in level
.
If returnSims = TRUE, then the individual simulations are attached to this
data.frame in the attribute sim.results
and are stored as a matrix.
merTools
includes the functions subBoot
and thetaExtract
to allow the user to estimate the variability in theta
from a larger
model by bootstrapping the model fit on a subset, to allow faster estimation.
m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy)
regFit <- predict(m1, newdata = sleepstudy[11, ]) # a single value is returned
intFit <- predictInterval(m1, newdata = sleepstudy[11, ]) # bounded values
# Can do glmer
d1 <- cbpp
d1$y <- d1$incidence / d1$size
gm2 <- glmer(y ~ period + (1 | herd), family = binomial, data = d1,
nAGQ = 9, weights = d1$size)
regFit <- predict(gm2, newdata = d1[1:10, ])
# get probabilities
regFit <- predict(gm2, newdata = d1[1:10, ], type = "response")
intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "probability")
intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "linear.prediction")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.