# Predict from merMod objects with a prediction interval

### Description

This function provides a way to capture model uncertainty in
predictions from multi-level models fit with `lme4`

. By drawing a sampling
distribution for the random and the fixed effects and then estimating the fitted
value across that distribution, it is possible to generate a prediction interval
for fitted values that includes all variation in the model except for variation
in the covariance paramters, theta. This is a much faster alternative than
bootstrapping for models fit to medium to large datasets.

### Usage

1 2 3 4 |

### Arguments

`merMod` |
a merMod object from lme4 |

`newdata` |
a data.frame of new data to predict |

`level` |
the width of the prediction interval |

`n.sims` |
number of simulation samples to construct |

`stat` |
take the median or mean of simulated intervals |

`type` |
type of prediction to develop |

`include.resid.var` |
logical, include or exclude the residual variance for linear models |

`returnSims` |
logical, should all n.sims simulations be returned? |

`seed` |
numeric, optional argument to set seed for simulations |

`.parallel, ` |
logical should parallel computation be used, default is FALSE |

`.paropts, ` |
a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing. |

### Details

To generate a prediction inteval, the function first computes a simulated
distribution of all of the parameters in the model. For the random, or grouping,
effects, this is done by sampling from a multivariate normal distribution which
is defined by the BLUP estimate provided by `ranef`

and the associated
variance-covariance matrix for each observed level of each grouping terms. For
each grouping term, an array is build that has as many rows as there are levels
of the grouping factor, as many columns as there are predictors at that level
(e.g. an intercept and slope), and is stacked as high as there are number of
simulations. These arrays are then multiplied by the new data provided to the
function to produce a matrix of yhat values. The result is a matrix of the simulated
values of the linear predictor for each observation for each simulation. Each
grouping term has such a matrix for each observation. These values can be added
to get the estimate of the fitted value for the random effect terms, and this
can then be added to a matrix of simulated values for the fixed effect level to
come up with `n.sims`

number of possible yhat values for each observation.

The distribution of simulated values is cut according to the interval requested
by the function. The median or mean value as well as the upper and lower bounds
are then returned. These can be presented either on the linear predictor scale
or on the response scale using the link function in the `merMod`

.

### Value

a data.frame with three columns:

`fit`

The center of the distribution of predicted values as defined by the

`stat`

parameter.`lwr`

The lower confidence interval bound corresponding to the quantile cut defined in

`level`

.`upr`

The upper confidence interval bound corresponding to the quantile cut defined in

`level`

.

If returnSims = TRUE, then the individual simulations are attached to this
data.frame in the attribute `sim.results`

and are stored as a matrix.

### Note

`merTools`

includes the functions `subBoot`

and `thetaExtract`

to allow the user to estimate the variability in `theta`

from a larger
model by bootstrapping the model fit on a subset, to allow faster estimation.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 | ```
m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy)
regFit <- predict(m1, newdata = sleepstudy[11, ]) # a single value is returned
intFit <- predictInterval(m1, newdata = sleepstudy[11, ]) # bounded values
# Can do glmer
d1 <- cbpp
d1$y <- d1$incidence / d1$size
gm2 <- glmer(y ~ period + (1 | herd), family = binomial, data = d1,
nAGQ = 9, weights = d1$size)
regFit <- predict(gm2, newdata = d1[1:10, ])
# get probabilities
regFit <- predict(gm2, newdata = d1[1:10, ], type = "response")
intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "probability")
intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "linear.prediction")
``` |