estimate_expectation: Model-based response estimates and uncertainty

View source: R/estimate_predicted.R

estimate_expectationR Documentation

Model-based response estimates and uncertainty

Description

After fitting a model, it is useful generate model-based estimates of the response variables for different combinations of predictor values. Such estimates can be used to make inferences about relationships between variables and to make predictions about individual cases.

Model-based response estimates and uncertainty can be generated for both the conditional average response values (the regression line or expectation) and for predictions about individual cases. See below for details.

Usage

estimate_expectation(
  model,
  data = NULL,
  ci = 0.95,
  keep_iterations = FALSE,
  ...
)

estimate_link(model, data = "grid", ci = 0.95, keep_iterations = FALSE, ...)

estimate_prediction(
  model,
  data = NULL,
  ci = 0.95,
  keep_iterations = FALSE,
  ...
)

estimate_relation(
  model,
  data = "grid",
  ci = 0.95,
  keep_iterations = FALSE,
  ...
)

Arguments

model

A statistical model.

data

A data frame with model's predictors to estimate the response. If NULL, the model's data is used. If "grid", the model matrix is obtained (through insight::get_datagrid()).

ci

Confidence Interval (CI) level. Default to 0.95 (⁠95%⁠).

keep_iterations

If TRUE, will keep all iterations (draws) of bootstrapped or Bayesian models. They will be added as additional columns named ⁠iter_1, iter_2, ...⁠. You can reshape them to a long format by running reshape_iterations().

...

You can add all the additional control arguments from insight::get_datagrid() (used when data = "grid") and insight::get_predicted().

Value

A data frame of predicted values and uncertainty intervals, with class "estimate_predicted". Methods for visualisation_recipe() and plot() are available.

Expected (average) values

The most important way that various types of response estimates differ is in terms of what quantity is being estimated and the meaning of the uncertainty intervals. The major choices are expected values for uncertainty in the regression line and predicted values for uncertainty in the individual case predictions.

Expected values refer to the fitted regression line - the estimated average response value (i.e., the "expectation") for individuals with specific predictor values. For example, in a linear model y = 2 + 3x + 4z + e, the estimated average y for individuals with x = 1 and z = 2 is 11.

For expected values, uncertainty intervals refer to uncertainty in the estimated conditional average (where might the true regression line actually fall)? Uncertainty intervals for expected values are also called "confidence intervals".

Expected values and their uncertainty intervals are useful for describing the relationship between variables and for describing how precisely a model has been estimated.

For generalized linear models, expected values are reported on one of two scales:

  • The link scale refers to scale of the fitted regression line, after transformation by the link function. For example, for a logistic regression (logit binomial) model, the link scale gives expected log-odds. For a log-link Poisson model, the link scale gives the expected log-count.

  • The response scale refers to the original scale of the response variable (i.e., without any link function transformation). Expected values on the link scale are back-transformed to the original response variable metric (e.g., expected probabilities for binomial models, expected counts for Poisson models).

Individual case predictions

In contrast to expected values, predicted values refer to predictions for individual cases. Predicted values are also called "posterior predictions" or "posterior predictive draws".

For predicted values, uncertainty intervals refer to uncertainty in the individual response values for each case (where might any single case actually fall)? Uncertainty intervals for predicted values are also called "prediction intervals" or "posterior predictive intervals".

Predicted values and their uncertainty intervals are useful for forecasting the range of values that might be observed in new data, for making decisions about individual cases, and for checking if model predictions are reasonable ("posterior predictive checks").

Predicted values and intervals are always on the scale of the original response variable (not the link scale).

Functions for estimating predicted values and uncertainty

modelbased provides 4 functions for generating model-based response estimates and their uncertainty:

  • estimate_expectation():

    • Generates expected values (conditional average) on the response scale.

    • The uncertainty interval is a confidence interval.

    • By default, values are computed using the data used to fit the model.

  • estimate_link():

    • Generates expected values (conditional average) on the link scale.

    • The uncertainty interval is a confidence interval.

    • By default, values are computed using a reference grid spanning the observed range of predictor values (see visualisation_matrix()).

  • estimate_prediction():

    • Generates predicted values (for individual cases) on the response scale.

    • The uncertainty interval is a prediction interval.

    • By default, values are computed using the data used to fit the model.

  • estimate_relation():

    • Like estimate_expectation().

    • Useful for visualizing a model.

    • Generates expected values (conditional average) on the response scale.

    • The uncertainty interval is a confidence interval.

    • By default, values are computed using a reference grid spanning the observed range of predictor values (see visualisation_matrix()).

Data for predictions

If the data = NULL, values are estimated using the data used to fit the model. If data = "grid", values are computed using a reference grid spanning the observed range of predictor values with visualisation_matrix(). This can be useful for model visualization. The number of predictor values used for each variable can be controlled with the length argument. data can also be a data frame containing columns with names matching the model frame (see insight::get_data()). This can be used to generate model predictions for specific combinations of predictor values.

Note

These functions are built on top of insight::get_predicted() and correspond to different specifications of its parameters. It may be useful to read its documentation, in particular the description of the predict argument for additional details on the difference between expected vs. predicted values and link vs. response scales.

Additional control parameters can be used to control results from insight::get_datagrid() (when data = "grid") and from insight::get_predicted() (the function used internally to compute predictions).

For plotting, check the examples in visualisation_recipe(). Also check out the Vignettes and README examples for various examples, tutorials and usecases.

Examples


library(modelbased)

# Linear Models
model <- lm(mpg ~ wt, data = mtcars)

# Get predicted and prediction interval (see insight::get_predicted)
estimate_expectation(model)

# Get expected values with confidence interval
pred <- estimate_relation(model)
pred

# Visualisation (see visualisation_recipe())
plot(pred)

# Standardize predictions
pred <- estimate_relation(lm(mpg ~ wt + am, data = mtcars))
z <- standardize(pred, include_response = FALSE)
z
unstandardize(z, include_response = FALSE)

# Logistic Models
model <- glm(vs ~ wt, data = mtcars, family = "binomial")
estimate_expectation(model)
estimate_relation(model)

# Mixed models
model <- lme4::lmer(mpg ~ wt + (1 | gear), data = mtcars)
estimate_expectation(model)
estimate_relation(model)

# Bayesian models

model <- suppressWarnings(rstanarm::stan_glm(
  mpg ~ wt,
  data = mtcars, refresh = 0, iter = 200
))
estimate_expectation(model)
estimate_relation(model)



easystats/estimate documentation built on Dec. 18, 2024, 4:41 p.m.