compare_models: Compute a p-value comparing two nested models.
In reyesem/IntroAnalysis: Functions for introductory statistics using linear models

compare_models.lm

R Documentation

Compute a p-value comparing two nested models.

Description

Tests whether a reduced (nested) model is sufficient for explaining the variability in the response compared to a more complex model.

Usage

## S3 method for class 'lm'
compare_models(
  full.mean.model,
  reduced.mean.model,
  alternative = c("ne", "not equal", "!=", "lt", "less than", "<", "gt", "greater than",
    ">", "at least one differs"),
  simulation.replications = 4999,
  assume.constant.variance = TRUE,
  assume.normality = FALSE,
  construct = c("normal-2", "normal-1", "two-point mass"),
  ...
)

## S3 method for class 'glm'
compare_models(
  full.mean.model,
  reduced.mean.model,
  alternative = c("ne", "not equal", "!=", "lt", "less than", "<", "gt", "greater than",
    ">", "at least one differs"),
  simulation.replications = 4999,
  method = c("classical", "parametric"),
  ...
)

compare_models(
  full.mean.model,
  reduced.mean.model,
  alternative = c("ne", "not equal", "!=", "lt", "less than", "<", "gt", "greater than",
    ">", "at least one differs"),
  simulation.replications = 4999,
  ...
)

Arguments

`full.mean.model`	`lm` or `glm` model object defining the full model.
`reduced.mean.model`	model object of the same type as `full.mean.model` defining the reduced model under the null hypothesis.
`alternative`	characterizes the form of the alternative hypothesis; one of `'not equal'` (`'ne'`, `'!='`), indicating a two-sided alternative; `'less than'` (`'lt'`, `'<'`), indicating a one-sided alternative where the parameter is less than the specified null value; or `'greater than'` (`'gt'`, `'>'`), indicating a one-sided alternative where the parameter is greater than the specified null value. This only applies when the reduced model under the null hypothesis differs from the full model by a single specified parameter. Otherwise, a two-sided test is performed (`'at least one differs'`, the standard ANOVA alternative).
`simulation.replications`	scalar indicating the number of samples to draw from the model for the null distribution (default = 4999). This will either be the number of bootstrap relications or the number of samples from the classical null distribution.
`assume.constant.variance`	boolean; if `TRUE` (default), all errors are assumed to have the same variance. If `FALSE`, each error is allowed to have a different variance.
`assume.normality`	boolean; if `TRUE`, the errors are assumed to follow a Normal distribution. If `FALSE` (default), this is not assumed.
`construct`	string defining the type of construct to use when generating from the distribution for the wild bootstrap (see `rmammen`). If `assume.constant.variance = TRUE`, this is ignored (default = `"normal-2"`).
`...`	additional arguments to be passed to other methods.
`method`	string defining the methodology to employ. If `"classical"` (default), the model is assumed correct and classical large-sample theory is used. If `"parametric"`, a parametric bootstrap is performed.

Details

This wrapper provides a single interface for commparing models under various conditions imposed on the model. Similar to anova. Howevever, the p-value provided can be computed using classical methods or bootstrapping.

For linear models, the following approaches are implemented:

classical: if both homoskedasticity and normality are assumed, the sampling distributions of a standardized statistic is modeled by an F-distribution.
parametric bootstrap: if normality can be assumed but homoskedasticity cannot, a parametric bootstrap can be peformed in which the variance for each observation is estimated by the square of the corresponding residual (similar to a White's correction).
residual bootstrap: if homoskedasticity can be assumed, but normality cannot, a residual bootstrap is used to compute the p-value.
wild bootstrap: if neither homoskedasticity nor normality is assumed, a wild bootstrap is used to compute the p-value.

All methods make additional requirements regarding independence of the error terms and that the model has been correctly specified.

For generalized linear models, the following approaches are implemented:

classical: if the distributional family is assumed correct, large sample theory is used to justify modeling the sampling distribution of a standardized statistic using a chi-squared distribution.
parametric bootstrap: the distributional family is assumed and a parametric bootstrap is performed to compute the p-value.

All methods require observations to be independent of one another.

Value

data.frame containing an ANOVA table comparing the two models. The data.frame has a single attribute "Null Distribution" which is a numeric vector of length simulation.replications which contains a sample from the model of the null distribution of the test statistic. This is useful for plotting the null distribution.

Methods (by class)

compare_models(lm): Computes p-value comparing nested linear models.
compare_models(glm): Computes p-value comparing nested generalized linear models.

Examples

fit1 <- lm(mpg ~ 1 + hp, data = mtcars)
fit0 <- lm(mpg ~ 1, data = mtcars)

# p-value computed via residual bootstrap
compare_models(fit1, fit0,
  assume.constant.variance = TRUE,
  assume.normality = FALSE)

# classical inference
compare_models(fit1, fit0,
  assume.constant.variance = TRUE,
  assume.normality = TRUE)

reyesem/IntroAnalysis documentation built on March 29, 2025, 3:29 p.m.