lmSubsets: All-Subsets Regression
In lmSubsets: Exact linear subset regression

Description Usage Arguments Details Value References See Also Examples

View source: R/generic.R

All-subsets regression for linear models estimated by ordinary least squares (OLS).

lmSubsets(formula, ...)

## Default S3 method:
lmSubsets(formula, data, subset, weights, na.action,
  model = TRUE, x = FALSE, y = FALSE, contrasts = NULL, offset, ...)

lmSubsets_fit(x, y, weights = NULL, offset = NULL,
  include = NULL, exclude = NULL, nmin = NULL, nmax = NULL,
  tolerance = 0, pradius = NULL, nbest = 1, ..., .algo = "phbba")

`formula, data, subset, weights, na.action, model, contrasts, offset`	Standard formula interface.
`x, y`	The model matrix and response.
`include, exclude`	Force regressors in or out.
`nmin, nmax`	Minimum and maximum number of regressors.
`tolerance`	Vector of tolerances.
`pradius`	Preordering radius.
`nbest`	Number of best subsets.
`...`	Ignored.
`.algo`	Internal use.

The generic lmSubsets computes all-variable-subsets regression for ordinary linear models. It provides various methods to conveniently specify the regressor and response variables. The standard formula interface (see lm) can be used, or the information can be extracted from an already fitted lm object. The regressor matrix and response variable can also be passed in directly.

The method computes the nbest best subset models for every subset size, where the "best" models are the models with the lowest residual sum of squares (RSS). The scope of the search can be limited to certain subset sizes by setting nmin and nmax. A tolerance vector (expanded if necessary) may be specified to speed up the algorithm, where tolerance[n] is the tolerance applied to subset models of size n.

By way of include and exclude, variables may be forced into or out of the regression, respectively.

The function will preorder the variables to reduce execution time if pradius > 0. Good execution times are usually attained for approximately pradius = n/3 (default value), where n is the number of regressors after evaluation include and exclude.

A set of standard extractor functions for fitted model objects is available for objects of class "lmSubsets". See methods for more details.

An object of class "lmSubsets", i.e. a list with the following components:

`nobs`	Number of observations.
`nvar`	Number of variables.
`weights`	Weights vector.
`offset`	Offset component.
`intercept`	`TRUE` if model has intercept term; `FALSE` otherwise.
`include`	Included regressors.
`exclude`	Excluded regressors.
`nmin, nmax`	Minimum and maximum subset sizes.
`tolerance`	Tolerance vector.
`nbest`	Number of best subsets.
`df`	Degrees of freedom.
`rss`	Residual sum of squares.
`which`	Selected variables.

Hofmann M, Gatu C, Kontoghiorghes EJ (2007). Efficient Algorithms for Computing the Best Subset Regression Models for Large-Scale Problems. Computational Statistics \& Data Analysis, 52, 16–29.

Gatu C, Kontoghiorghes EJ (2006). Branch-and-Bound Algorithms for Computing the Best Subset Regression Models. Journal of Computational and Graphical Statistics, 15, 139–156.

lmSelect, summary, methods.

## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")

#################
## basic usage ##
#################

## canonical example: fit all subsets
all.AirPoll <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 10)

## visualize RSS
plot(all.AirPoll)

## summarize
summary(all.AirPoll)

## forced inclusion/exclusion of variables
all_2.AirPoll <- lmSubsets(all.AirPoll, include = "noncauc",
                                        exclude = "whitecollar")
summary(all_2.AirPoll)