predictInterval: Predict from merMod objects with a prediction interval

View source: R/merPredict.R

predictIntervalR Documentation

Predict from merMod objects with a prediction interval

Description

This function provides a way to capture model uncertainty in predictions from multi-level models fit with lme4. By drawing a sampling distribution for the random and the fixed effects and then estimating the fitted value across that distribution, it is possible to generate a prediction interval for fitted values that includes all variation in the model except for variation in the covariance parameters, theta. This is a much faster alternative than bootstrapping for models fit to medium to large datasets.

Usage

predictInterval(
  merMod,
  newdata,
  which = c("full", "fixed", "random", "all"),
  level = 0.8,
  n.sims = 1000,
  stat = c("median", "mean"),
  type = c("linear.prediction", "probability"),
  include.resid.var = TRUE,
  returnSims = FALSE,
  seed = NULL,
  .parallel = FALSE,
  .paropts = NULL,
  fix.intercept.variance = FALSE,
  ignore.fixed.terms = NULL
)

Arguments

merMod

a merMod object from lme4

newdata

a data.frame of new data to predict

which

a character specifying what to return, by default it returns the full interval, but you can also select to return only the fixed variation or the random component variation. If full is selected the resulting data.frame will be nrow(newdata) * number of model levels long

level

the width of the prediction interval

n.sims

number of simulation samples to construct

stat

take the median or mean of simulated intervals

type

type of prediction to develop

include.resid.var

logical, include or exclude the residual variance for linear models

returnSims

logical, should all n.sims simulations be returned?

seed

numeric, optional argument to set seed for simulations

.parallel,

logical should parallel computation be used, default is FALSE

.paropts,

-NOT USED: Caused issue #54- a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing.

fix.intercept.variance

logical; should the variance of the intercept term be adjusted downwards to roughly correct for its covariance with the random effects, as if all the random effects are intercept effects?

ignore.fixed.terms

a numeric or string vector of indexes or names of fixed effects which should be considered as fully known (zero variance). This can result in under-conservative intervals, but for models with random effects nested inside fixed effects, holding the fixed effects constant intervals may give intervals with closer to nominal coverage than the over-conservative intervals without this option, which ignore negative correlation between the outer (fixed) and inner (random) coefficients.

Details

To generate a prediction interval, the function first computes a simulated distribution of all of the parameters in the model. For the random, or grouping, effects, this is done by sampling from a multivariate normal distribution which is defined by the BLUP estimate provided by ranef and the associated variance-covariance matrix for each observed level of each grouping terms. For each grouping term, an array is build that has as many rows as there are levels of the grouping factor, as many columns as there are predictors at that level (e.g. an intercept and slope), and is stacked as high as there are number of simulations. These arrays are then multiplied by the new data provided to the function to produce a matrix of yhat values. The result is a matrix of the simulated values of the linear predictor for each observation for each simulation. Each grouping term has such a matrix for each observation. These values can be added to get the estimate of the fitted value for the random effect terms, and this can then be added to a matrix of simulated values for the fixed effect level to come up with n.sims number of possible yhat values for each observation.

The distribution of simulated values is cut according to the interval requested by the function. The median or mean value as well as the upper and lower bounds are then returned. These can be presented either on the linear predictor scale or on the response scale using the link function in the merMod.

Value

a data.frame with three columns:

fit

The center of the distribution of predicted values as defined by the stat parameter.

lwr

The lower prediction interval bound corresponding to the quantile cut defined in level.

upr

The upper prediction interval bound corresponding to the quantile cut defined in level.

If returnSims = TRUE, then the individual simulations are attached to this data.frame in the attribute sim.results and are stored as a matrix.

Note

merTools includes the functions subBoot and thetaExtract to allow the user to estimate the variability in theta from a larger model by bootstrapping the model fit on a subset, to allow faster estimation.

Examples


m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy)
regFit <- predict(m1, newdata = sleepstudy[11, ]) # a single value is returned
intFit <- predictInterval(m1, newdata = sleepstudy[11, ]) # bounded values
# Can do glmer
d1 <- cbpp
d1$y <- d1$incidence / d1$size
 gm2 <- glmer(y ~ period + (1 | herd), family = binomial, data = d1,
               nAGQ = 9, weights = d1$size)
 regFit <- predict(gm2, newdata = d1[1:10, ])
 # get probabilities
 regFit <- predict(gm2, newdata = d1[1:10, ], type = "response")
 intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "probability")
 intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "linear.prediction")
 

merTools documentation built on March 31, 2023, 8:43 p.m.