compare_models: Compute a p-value comparing two nested models.

View source: R/modeling_phrases.R

compare_models.lmR Documentation

Compute a p-value comparing two nested models.

Description

Tests whether a reduced (nested) model is sufficient for explaining the variability in the response compared to a more complex model.

Usage

## S3 method for class 'lm'
compare_models(
  full.mean.model,
  reduced.mean.model,
  alternative = c("ne", "not equal", "!=", "lt", "less than", "<", "gt", "greater than",
    ">", "at least one differs"),
  simulation.replications = 4999,
  assume.constant.variance = TRUE,
  assume.normality = FALSE,
  construct = c("normal-2", "normal-1", "two-point mass"),
  ...
)

## S3 method for class 'glm'
compare_models(
  full.mean.model,
  reduced.mean.model,
  alternative = c("ne", "not equal", "!=", "lt", "less than", "<", "gt", "greater than",
    ">", "at least one differs"),
  simulation.replications = 4999,
  method = c("classical", "parametric"),
  ...
)

compare_models(
  full.mean.model,
  reduced.mean.model,
  alternative = c("ne", "not equal", "!=", "lt", "less than", "<", "gt", "greater than",
    ">", "at least one differs"),
  simulation.replications = 4999,
  ...
)

Arguments

full.mean.model

lm or glm model object defining the full model.

reduced.mean.model

model object of the same type as full.mean.model defining the reduced model under the null hypothesis.

alternative

characterizes the form of the alternative hypothesis; one of 'not equal' ('ne', '!='), indicating a two-sided alternative; 'less than' ('lt', '<'), indicating a one-sided alternative where the parameter is less than the specified null value; or 'greater than' ('gt', '>'), indicating a one-sided alternative where the parameter is greater than the specified null value. This only applies when the reduced model under the null hypothesis differs from the full model by a single specified parameter. Otherwise, a two-sided test is performed ('at least one differs', the standard ANOVA alternative).

simulation.replications

scalar indicating the number of samples to draw from the model for the null distribution (default = 4999). This will either be the number of bootstrap relications or the number of samples from the classical null distribution.

assume.constant.variance

boolean; if TRUE (default), all errors are assumed to have the same variance. If FALSE, each error is allowed to have a different variance.

assume.normality

boolean; if TRUE, the errors are assumed to follow a Normal distribution. If FALSE (default), this is not assumed.

construct

string defining the type of construct to use when generating from the distribution for the wild bootstrap (see rmammen). If assume.constant.variance = TRUE, this is ignored (default = "normal-2").

...

additional arguments to be passed to other methods.

method

string defining the methodology to employ. If "classical" (default), the model is assumed correct and classical large-sample theory is used. If "parametric", a parametric bootstrap is performed.

Details

This wrapper provides a single interface for commparing models under various conditions imposed on the model. Similar to anova. Howevever, the p-value provided can be computed using classical methods or bootstrapping.

For linear models, the following approaches are implemented:

  • classical: if both homoskedasticity and normality are assumed, the sampling distributions of a standardized statistic is modeled by an F-distribution.

  • parametric bootstrap: if normality can be assumed but homoskedasticity cannot, a parametric bootstrap can be peformed in which the variance for each observation is estimated by the square of the corresponding residual (similar to a White's correction).

  • residual bootstrap: if homoskedasticity can be assumed, but normality cannot, a residual bootstrap is used to compute the p-value.

  • wild bootstrap: if neither homoskedasticity nor normality is assumed, a wild bootstrap is used to compute the p-value.

All methods make additional requirements regarding independence of the error terms and that the model has been correctly specified.

For generalized linear models, the following approaches are implemented:

  • classical: if the distributional family is assumed correct, large sample theory is used to justify modeling the sampling distribution of a standardized statistic using a chi-squared distribution.

  • parametric bootstrap: the distributional family is assumed and a parametric bootstrap is performed to compute the p-value.

All methods require observations to be independent of one another.

Value

data.frame containing an ANOVA table comparing the two models. The data.frame has a single attribute "Null Distribution" which is a numeric vector of length simulation.replications which contains a sample from the model of the null distribution of the test statistic. This is useful for plotting the null distribution.

Methods (by class)

  • compare_models(lm): Computes p-value comparing nested linear models.

  • compare_models(glm): Computes p-value comparing nested generalized linear models.

See Also

anova

Examples

fit1 <- lm(mpg ~ 1 + hp, data = mtcars)
fit0 <- lm(mpg ~ 1, data = mtcars)

# p-value computed via residual bootstrap
compare_models(fit1, fit0,
  assume.constant.variance = TRUE,
  assume.normality = FALSE)

# classical inference
compare_models(fit1, fit0,
  assume.constant.variance = TRUE,
  assume.normality = TRUE)



reyesem/IntroAnalysis documentation built on March 29, 2025, 3:29 p.m.