View source: R/estimate_predicted.R
estimate_expectation | R Documentation |
After fitting a model, it is useful generate model-based estimates of the response variables for different combinations of predictor values. Such estimates can be used to make inferences about relationships between variables, to make predictions about individual cases, or to compare the predicted values against the observed data.
The modelbased
package includes 4 "related" functions, that mostly differ in
their default arguments (in particular, data
and predict
):
estimate_prediction(data = NULL, predict = "prediction", ...)
estimate_expectation(data = NULL, predict = "expectation", ...)
estimate_relation(data = "grid", predict = "expectation", ...)
estimate_link(data = "grid", predict = "link", ...)
While they are all based on model-based predictions (using
insight::get_predicted()
), they differ in terms of the type of
predictions they make by default. For instance, estimate_prediction()
and
estimate_expectation()
return predictions for the original data used to fit
the model, while estimate_relation()
and estimate_link()
return
predictions on a insight::get_datagrid()
. Similarly, estimate_link
returns predictions on the link scale, while the others return predictions on
the response scale. Note that the relevance of these differences depends on
the model family (for instance, for linear models, estimate_relation
is
equivalent to estimate_link()
, since there is no difference between the
link-scale and the response scale).
Note that you can run plot()
on
the output of these functions to get some visual insights (see the
plotting examples).
See the details section below for details about the different possibilities.
estimate_expectation(
model,
data = NULL,
by = NULL,
predict = "expectation",
ci = 0.95,
transform = NULL,
keep_iterations = FALSE,
...
)
estimate_link(
model,
data = "grid",
by = NULL,
predict = "link",
ci = 0.95,
transform = NULL,
keep_iterations = FALSE,
...
)
estimate_prediction(
model,
data = NULL,
by = NULL,
predict = "prediction",
ci = 0.95,
transform = NULL,
keep_iterations = FALSE,
...
)
estimate_relation(
model,
data = "grid",
by = NULL,
predict = "expectation",
ci = 0.95,
transform = NULL,
keep_iterations = FALSE,
...
)
model |
A statistical model. |
data |
A data frame with model's predictors to estimate the response. If
|
by |
The predictor variable(s) at which to estimate the response. Other
predictors of the model that are not included here will be set to their mean
value (for numeric predictors), reference level (for factors) or mode (other
types). The |
predict |
This parameter controls what is predicted (and gets internally
passed to |
ci |
Confidence Interval (CI) level. Default to |
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
keep_iterations |
If |
... |
You can add all the additional control arguments from
|
A data frame of predicted values and uncertainty intervals, with
class "estimate_predicted"
. Methods for visualisation_recipe()
and plot()
are available.
The most important way that various types of response estimates differ is in terms of what quantity is being estimated and the meaning of the uncertainty intervals. The major choices are expected values for uncertainty in the regression line and predicted values for uncertainty in the individual case predictions.
Expected values refer to the fitted regression line - the estimated average response value (i.e., the "expectation") for individuals with specific predictor values. For example, in a linear model y = 2 + 3x + 4z + e, the estimated average y for individuals with x = 1 and z = 2 is 11.
For expected values, uncertainty intervals refer to uncertainty in the estimated conditional average (where might the true regression line actually fall)? Uncertainty intervals for expected values are also called "confidence intervals".
Expected values and their uncertainty intervals are useful for describing the relationship between variables and for describing how precisely a model has been estimated.
For generalized linear models, expected values are reported on one of two scales:
The link scale refers to scale of the fitted regression line, after transformation by the link function. For example, for a logistic regression (logit binomial) model, the link scale gives expected log-odds. For a log-link Poisson model, the link scale gives the expected log-count.
The response scale refers to the original scale of the response variable (i.e., without any link function transformation). Expected values on the link scale are back-transformed to the original response variable metric (e.g., expected probabilities for binomial models, expected counts for Poisson models).
In contrast to expected values, predicted values refer to predictions for individual cases. Predicted values are also called "posterior predictions" or "posterior predictive draws".
For predicted values, uncertainty intervals refer to uncertainty in the individual response values for each case (where might any single case actually fall)? Uncertainty intervals for predicted values are also called "prediction intervals" or "posterior predictive intervals".
Predicted values and their uncertainty intervals are useful for forecasting the range of values that might be observed in new data, for making decisions about individual cases, and for checking if model predictions are reasonable ("posterior predictive checks").
Predicted values and intervals are always on the scale of the original response variable (not the link scale).
modelbased provides 4 functions for generating model-based response estimates and their uncertainty:
estimate_expectation()
:
Generates expected values (conditional average) on the response scale.
The uncertainty interval is a confidence interval.
By default, values are computed using the data used to fit the model.
estimate_link()
:
Generates expected values (conditional average) on the link scale.
The uncertainty interval is a confidence interval.
By default, values are computed using a reference grid spanning the
observed range of predictor values (see insight::get_datagrid()
).
estimate_prediction()
:
Generates predicted values (for individual cases) on the response scale.
The uncertainty interval is a prediction interval.
By default, values are computed using the data used to fit the model.
estimate_relation()
:
Like estimate_expectation()
.
Useful for visualizing a model.
Generates expected values (conditional average) on the response scale.
The uncertainty interval is a confidence interval.
By default, values are computed using a reference grid spanning the
observed range of predictor values (see insight::get_datagrid()
).
If the data = NULL
, values are estimated using the data used to fit the
model. If data = "grid"
, values are computed using a reference grid
spanning the observed range of predictor values with
insight::get_datagrid()
. This can be useful for model visualization. The
number of predictor values used for each variable can be controlled with the
length
argument. data
can also be a data frame containing columns with
names matching the model frame (see insight::get_data()
). This can be used
to generate model predictions for specific combinations of predictor values.
These functions are built on top of insight::get_predicted()
and correspond
to different specifications of its parameters. It may be useful to read its
documentation,
in particular the description of the predict
argument for additional
details on the difference between expected vs. predicted values and link vs.
response scales.
Additional control parameters can be used to control results from
insight::get_datagrid()
(when data = "grid"
) and from
insight::get_predicted()
(the function used internally to compute
predictions).
For plotting, check the examples in visualisation_recipe()
. Also check out
the Vignettes and README examples for
various examples, tutorials and usecases.
library(modelbased)
# Linear Models
model <- lm(mpg ~ wt, data = mtcars)
# Get predicted and prediction interval (see insight::get_predicted)
estimate_expectation(model)
# Get expected values with confidence interval
pred <- estimate_relation(model)
pred
# Visualisation (see visualisation_recipe())
plot(pred)
# Standardize predictions
pred <- estimate_relation(lm(mpg ~ wt + am, data = mtcars))
z <- standardize(pred, include_response = FALSE)
z
unstandardize(z, include_response = FALSE)
# Logistic Models
model <- glm(vs ~ wt, data = mtcars, family = "binomial")
estimate_expectation(model)
estimate_relation(model)
# Mixed models
model <- lme4::lmer(mpg ~ wt + (1 | gear), data = mtcars)
estimate_expectation(model)
estimate_relation(model)
# Bayesian models
model <- suppressWarnings(rstanarm::stan_glm(
mpg ~ wt,
data = mtcars, refresh = 0, iter = 200
))
estimate_expectation(model)
estimate_relation(model)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.