predictions | R Documentation |
Outcome predicted by a fitted model on a specified scale for a given
combination of values of the predictor variables, such as their observed
values, their means, or factor levels (a.k.a. "reference grid"). The
tidy()
and summary()
functions can be used to aggregate the output of
predictions()
. To learn more, read the predictions vignette, visit the
package website, or scroll down this page for a full list of vignettes:
predictions( model, newdata = NULL, variables = NULL, vcov = TRUE, conf_level = 0.95, type = NULL, by = NULL, byfun = NULL, wts = NULL, transform_post = NULL, hypothesis = NULL, ... )
model |
Model object |
newdata |
|
variables |
|
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
conf_level |
numeric value between 0 and 1. Confidence level to use to build a confidence interval. |
type |
string indicates the type (scale) of the predictions used to
compute marginal effects or contrasts. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
by |
Character vector of variable names over which to compute group-wise estimates. |
byfun |
A function such as |
wts |
string or numeric: weights to use when computing average
contrasts or marginaleffects. These weights only affect the averaging in
|
transform_post |
(experimental) A function applied to unit-level adjusted predictions and confidence intervals just before the function returns results. For bayesian models, this function is applied to individual draws from the posterior distribution, before computing summaries. |
hypothesis |
specify a hypothesis test or custom contrast using a vector, matrix, string, or string formula.
|
... |
Additional arguments are passed to the |
The newdata
argument, the tidy()
function, and datagrid()
function can be used to control the kind of predictions to report:
Average Predictions
Predictions at the Mean
Predictions at User-Specified values (aka Predictions at Representative values).
When possible, predictions()
delegates the computation of confidence
intervals to the insight::get_predicted()
function, which uses back
transformation to produce adequate confidence intervals on the scale
specified by the type
argument. When this is not possible, predictions()
uses the Delta Method to compute standard errors around adjusted
predictions, and builds symmetric confidence intervals. These naive symmetric
intervals may not always be appropriate. For instance, they may stretch beyond
the bounds of a binary response variables.
A data.frame
with one row per observation and several columns:
rowid
: row number of the newdata
data frame
type
: prediction type, as defined by the type
argument
group
: (optional) value of the grouped outcome (e.g., categorical outcome models)
predicted
: predicted outcome
std.error
: standard errors computed by the insight::get_predicted
function or, if unavailable, via marginaleffects
delta method functionality.
conf.low
: lower bound of the confidence interval (or equal-tailed interval for bayesian models)
conf.high
: upper bound of the confidence interval (or equal-tailed interval for bayesian models)
Vignettes:
Case studies:
Tips and technical notes:
Some model types allow model-specific arguments to modify the nature of marginal effects, predictions, marginal means, and contrasts.
Package | Class | Argument | Documentation |
brms | brmsfit | ndraws | brms::posterior_predict |
re_formula | |||
lme4 | merMod | include_random | insight::get_predicted |
re.form | lme4::predict.merMod | ||
allow.new.levels | lme4::predict.merMod | ||
glmmTMB | glmmTMB | re.form | glmmTMB::predict.glmmTMB |
allow.new.levels | glmmTMB::predict.glmmTMB | ||
zitype | glmmTMB::predict.glmmTMB | ||
mgcv | bam | exclude | mgcv::predict.bam |
robustlmm | rlmerMod | re.form | robustlmm::predict.rlmerMod |
allow.new.levels | robustlmm::predict.rlmerMod | ||
# Adjusted Prediction for every row of the original dataset mod <- lm(mpg ~ hp + factor(cyl), data = mtcars) pred <- predictions(mod) head(pred) # Adjusted Predictions at User-Specified Values of the Regressors predictions(mod, newdata = datagrid(hp = c(100, 120), cyl = 4)) m <- lm(mpg ~ hp + drat + factor(cyl) + factor(am), data = mtcars) predictions(m, newdata = datagrid(FUN_factor = unique, FUN_numeric = median)) # Average Adjusted Predictions (AAP) library(dplyr) mod <- lm(mpg ~ hp * am * vs, mtcars) pred <- predictions(mod) summary(pred) predictions(mod, by = "am") # Conditional Adjusted Predictions plot_cap(mod, condition = "hp") # Counterfactual predictions with the `variables` argument # the `mtcars` dataset has 32 rows mod <- lm(mpg ~ hp + am, data = mtcars) p <- predictions(mod) head(p) nrow(p) # counterfactual predictions obtained by replicating the entire for different # values of the predictors p <- predictions(mod, variables = list(hp = c(90, 110))) nrow(p) # hypothesis test: is the prediction in the 1st row equal to the prediction in the 2nd row mod <- lm(mpg ~ wt + drat, data = mtcars) predictions( mod, newdata = datagrid(wt = 2:3), hypothesis = "b1 = b2") # same hypothesis test using row indices predictions( mod, newdata = datagrid(wt = 2:3), hypothesis = "b1 - b2 = 0") # same hypothesis test using numeric vector of weights predictions( mod, newdata = datagrid(wt = 2:3), hypothesis = c(1, -1)) # two custom contrasts using a matrix of weights lc <- matrix(c( 1, -1, 2, 3), ncol = 2) predictions( mod, newdata = datagrid(wt = 2:3), hypothesis = lc) # `by` argument mod <- lm(mpg ~ hp * am * vs, data = mtcars) predictions(mod, by = c("am", "vs")) library(nnet) nom <- multinom(factor(gear) ~ mpg + am * vs, data = mtcars, trace = FALSE) # first 5 raw predictions predictions(nom, type = "probs") |> head() # average predictions predictions(nom, type = "probs", by = "group") |> summary() by <- data.frame( group = c("3", "4", "5"), by = c("3,4", "3,4", "5")) predictions(nom, type = "probs", by = by) # sum of predicted probabilities for combined response levels mod <- multinom(factor(cyl) ~ mpg + am, data = mtcars, trace = FALSE) by <- data.frame( by = c("4,6", "4,6", "8"), group = as.character(c(4, 6, 8))) predictions(mod, newdata = "mean", byfun = sum, by = by)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.