estimate_mean_response: Estimate the mean response for a given set of covariates.

View source: R/modeling_phrases.R

estimate_mean_response.lmR Documentation

Estimate the mean response for a given set of covariates.

Description

Provides point estimates and confidence intervals for the mean response of a linear or generalized linear model via bootstrapping or classical theory.

Usage

## S3 method for class 'lm'
estimate_mean_response(
  mean.model,
  confidence.level,
  simulation.replications = 4999,
  assume.constant.variance = TRUE,
  assume.normality = FALSE,
  construct = c("normal-2", "normal-1", "two-point mass"),
  type = c("percentile", "BC"),
  ...
)

## S3 method for class 'glm'
estimate_mean_response(
  mean.model,
  confidence.level,
  simulation.replications = 4999,
  method = c("classical", "parametric", "case-resampling"),
  type = c("percentile", "BC", "bootstrap-t"),
  ...
)

estimate_mean_response(
  mean.model,
  confidence.level,
  simulation.replications = 4999,
  ...
)

Arguments

mean.model

lm or glm model fit defining the model and therefore the parameters of the mean model to be estimated.

confidence.level

scalar between 0 and 1 indicating the confidence level for all confidence intervals constructed. If missing (default), only point estimates are returned.

simulation.replications

scalar indicating the number of samples to draw from the model for the sampling distribution (default = 4999). This will either be the number of bootstrap replications or the number of samples from the classical sampling distribution. This is ignored if confidence.level is not specified.

assume.constant.variance

boolean; if TRUE (default), errors are assumed to have the same variance. If FALSE, each error term is allowed to have a different variance.

assume.normality

boolean; if TRUE, the errors are assumed to follow a Normal distribution. If FALSE (default), this is not assumed.

construct

string defining the type of construct to use when generating from the distribution for the wild bootstrap (see rmammen). If assume.constant.variance = TRUE, this is ignored (default = "normal-2").

type

string defining the type of confidence interval to construct. If "percentile" (default) an equal-tailed percentile interval is constructed. If "BC" the bias-corrected percentile interval is constructed. Currently, the bootstrap-t interval is not supported.

...

additional arguments to be passed to other methods including the list of variables and their values at which to estimate the mean response.

method

string defining the methodology to employ. If "classical" (default), the model is assumed correct and classical large-sample theory is used. If "parametric", a parametric bootstrap is performed. If "case-resampling", a case-resampling bootstrap is performed.

Details

This wrapper provides a single interface for estimating the mean response under various various conditions imposed on the model. Similar to predict, point and interval estimates of the mean response are available. However, interval estimates can be constructed via bootstrapping or classical theory.

For linear models, the following approaches are implemented:

  • classical: if both homoskedasticity and normality are assumed, the sampling distributions of a standardized statistic is modeled by a t-distribution.

  • parametric bootstrap: if normality can be assumed but homoskedasticity cannot, a parametric bootstrap can be peformed in which the variance for each observation is estimated by the square of the corresponding residual (similar to a White's correction).

  • residual bootstrap: if homoskedasticity can be assumed, but normality cannot, a residual bootstrap is used to compute standard errors and confidence intervals.

  • wild bootstrap: if neither homoskedasticity nor normality is assumed, a wild bootstrap is used to compute standard errors and confidence intervals.

All methods make additional requirements regarding independence of the error terms and that the model has been correctly specified.

For generalized linear models, the following approaches are implemented:

  • classical: if the distributional family is assumed correct, large sample theory is used to justify modeling the sampling distribution of a standardized statistic using a standard normal distribution.

  • parametric bootstrap: the distributional family is assumed and a parametric bootstrap is performed to compute standard errors and confidence intervals.

  • nonparametric bootstrap: a case resampling bootstrap algorithm is used to estimate standard errors and confidence intervals.

All methods require observations to be independent of one another.

Confidence intervals constructed via bootstrapping can take on various forms. The percentile interval is constructed by taking the empirical 100\alpha and 100(1-\alpha) percentiles from the bootstrap statistics. If \hat{F} is the empirical distribution function of the bootstrap values, then the 100(1 - 2\alpha) given by

(\hat{F}^{-1}(\alpha), \hat{F}^{-1}(1-\alpha))

The bias-corrected (BC) interval corrects for median-bias. It is given by

(\hat{F}^{-1}(\alpha_1), \hat{F}^{-1}(1-\alpha_2))

where

\alpha_1 = \Phi{2\hat{z}_0 + \Phi^{-1}(\alpha)}

\alpha_2 = 1 - \Phi{2\hat{z}_0 + \Phi^{-1}(1-\alpha)}

\hat{z}_0 = \Phi^{-1}(\hat{F}(\hat{\beta}))

where \hat{\beta} is the estimate from the original sample. The bootstrap-t interval is based on the bootstrap distribution of

t^{b} = \frac{\hat{\beta}^{b} - \hat{\beta}}{\hat{\sigma}^{b}}

where \hat{\sigma} is the estimate of the standard error of \hat{\beta} and the superscript b denotes a bootstrap sample. Let \hat{G} be the empirical distribution function of the bootstrap standardized statistics given above. Then, the bootstrap-t interval is given by

(\hat{\beta} - \hat{\sigma}\hat{G}^{-1}(1-\alpha), \hat{\beta} - \hat{\sigma}\hat{G}^{-1}\alpha)

Value

data.frame containing a table of estimates. The object has an additional attributed "Sampling Distribution" which is a matrix with simulation.replications rows and the same number of columns as predictions to be made for the mean. Each column contains a sample from the corresponding model of the sampling distribution. This is useful for plotting the modeled sampling distribution.

Methods (by class)

  • estimate_mean_response(lm): Estimates mean response for linear models.

  • estimate_mean_response(glm): Estimates mean response for generalized linear models.

Examples

fit <- lm(mpg ~ 1 + hp, data = mtcars)

# estimate the mean response for vehicle with 120 and 130 horse power.
estimate_mean_response(fit,
  confidence.level = 0.95,
  assume.constant.variance = TRUE,
  assume.normality = TRUE,
  hp = c(120, 130))


reyesem/IntroAnalysis documentation built on March 29, 2025, 3:29 p.m.