buildModelSeries: Build a series of linear models using automated variable...

View source: R/buildMS.R

buildModelSeriesR Documentation

Build a series of linear models using automated variable selection

Description

Build a series of linear models with stats::lm() using one or more automated variable selection methods implemented in the functions stepVIF() and MASS::stepAIC().

Usage

buildModelSeries(
  formula,
  data,
  vif = FALSE,
  vif.threshold = 10,
  vif.verbose = FALSE,
  aic = FALSE,
  aic.direction = "both",
  aic.trace = FALSE,
  aic.steps = 5000,
  ...
)

buildMS(
  formula,
  data,
  vif = FALSE,
  vif.threshold = 10,
  vif.verbose = FALSE,
  aic = FALSE,
  aic.direction = "both",
  aic.trace = FALSE,
  aic.steps = 5000,
  ...
)

Arguments

formula

A list containing one or several model formulas (a symbolic description of the model to be fitted).

data

Data frame containing the variables in the model formulas.

vif

Logical for performing backward variable selection using the Variance-Inflation Factor (VIF). Defaults to vif = FALSE.

vif.threshold

Numeric value setting the maximum acceptable VIF value. Defaults to vif.threshold = 10.

vif.verbose

Logical for printing iteration results of backward variable selection using the VIF. Defaults to vif.verbose = FALSE.

aic

Logical for performing variable selection using Akaike's Information Criterion (AIC). Defaults to aic = FALSE.

aic.direction

Character string setting the direction of variable selection when using AIC, with options "both" (default), "forward", and "backward".

aic.trace

Logical for printing iteration results of variable selection using the AIC. Defaults to aic.trace = FALSE.

aic.steps

Integer value setting the maximum number of steps to be considered for variable selection using the AIC. Defaults to aic.steps = 5000.

...

Further arguments passed to MASS::stepAIC().

Details

buildModelSeries() was devised to deal with a list of linear model formulas. The main objective is to bring together several functions commonly used when building linear models, such as automated variable selection. In the current implementation, variable selection can be done using stepVIF() or MASS::stepAIC() or both. stepVIF() is a backward variable selection procedure, while MASS::stepAIC() supports backward, forward, and bidirectional variable selection. For more information about these functions, please visit their respective help pages.

An important feature of buildModelSeries() is that it records the initial number of candidate predictor variables and observations offered to the model, and adds this information as an attribute to the final selected model. Such feature was included because variable selection procedures result biased linear models (too optimistic), and the effective number of degrees of freedom is close to the number of candidate predictor variables initially offered to the model (Harrell, 2001). With the initial number of candidate predictor variables and observations offered to the model, one can calculate penalized or adjusted measures of model performance. For models built using buildModelSeries(), this can be done using statsModelSeries().

Some important details should be clear when using buildModelSeries():

  • this function was originally devised to deal with a list of formulas, but can also be used with a single formula;

  • in the current implementation, stepVIF() runs before MASS::stepAIC();

  • function arguments imported from MASS::stepAIC() and stepVIF() were named as in the original functions, and received a prefix (aic or vif) to help the user identifying which function is affected by a given argument without having to go check the documentation.

Value

A list containing the fitted linear models.

TODO

Add option to set the order in which MASS::stepAIC() and stepVIF() are run.

Dependencies

The MASS package, provider of support functions and datasets for Venables and Ripley's Modern Applied Statistics with S, is required for buildModelSeries() to work. The development version of the MASS package is available on https://www.stats.ox.ac.uk/pub/MASS4/ while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/MASS/.

Author(s)

Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com

References

Harrell, F. E. (2001) Regression modelling strategies: with applications to linear models, logistic regression, and survival analysis. First edition. New York: Springer.

Venables, W. N. and Ripley, B. D. (2002) Modern applied statistics with S. Fourth edition. New York: Springer.

A. Samuel-Rosa, G. B. M. Heuvelink, G. de Mattos Vasques, and L. H. C. dos Anjos, Do more detailed environmental covariates deliver more accurate soil maps?, Geoderma, vol. 243–244, pp. 214–227, May 2015, doi: 10.1016/j.geoderma.2014.12.017.

See Also

stepVIF(), statsMS()

Examples

if (interactive()) {
  # based on the second example of MASS::stepAIC()
  library("MASS")
  cpus1 <- cpus
  for(v in names(cpus)[2:7])
    cpus1[[v]] <- cut(cpus[[v]], unique(stats::quantile(cpus[[v]])),
                      include.lowest = TRUE)
  cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
  cpus.samp <- sample(1:209, 100)
  cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
                    chmax + perf),
                    formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
                    formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
  data <- cpus1[cpus.samp,2:8]
  cpus.ms <- buildModelSeries(cpus.form, data, vif = TRUE, aic = TRUE)
}

samuel-rosa/pedometrics documentation built on June 21, 2022, 11:32 p.m.