lmSubsets: All-Subsets Regression

Description Usage Arguments Details Value References See Also Examples

View source: R/generic.R

Description

All-subsets regression for linear models estimated by ordinary least squares (OLS).

Usage

1
2
3
4
5
6
7
8
9
lmSubsets(formula, ...)

## Default S3 method:
lmSubsets(formula, data, subset, weights, na.action,
  model = TRUE, x = FALSE, y = FALSE, contrasts = NULL, offset, ...)

lmSubsets_fit(x, y, weights = NULL, offset = NULL,
  include = NULL, exclude = NULL, nmin = NULL, nmax = NULL,
  tolerance = 0, pradius = NULL, nbest = 1, ..., .algo = "phbba")

Arguments

formula, data, subset, weights, na.action, model, contrasts, offset

Standard formula interface.

x, y

The model matrix and response.

include, exclude

Force regressors in or out.

nmin, nmax

Minimum and maximum number of regressors.

tolerance

Vector of tolerances.

pradius

Preordering radius.

nbest

Number of best subsets.

...

Ignored.

.algo

Internal use.

Details

The generic lmSubsets computes all-variable-subsets regression for ordinary linear models. It provides various methods to conveniently specify the regressor and response variables. The standard formula interface (see lm) can be used, or the information can be extracted from an already fitted lm object. The regressor matrix and response variable can also be passed in directly.

The method computes the nbest best subset models for every subset size, where the "best" models are the models with the lowest residual sum of squares (RSS). The scope of the search can be limited to certain subset sizes by setting nmin and nmax. A tolerance vector (expanded if necessary) may be specified to speed up the algorithm, where tolerance[n] is the tolerance applied to subset models of size n.

By way of include and exclude, variables may be forced into or out of the regression, respectively.

The function will preorder the variables to reduce execution time if pradius > 0. Good execution times are usually attained for approximately pradius = n/3 (default value), where n is the number of regressors after evaluation include and exclude.

A set of standard extractor functions for fitted model objects is available for objects of class "lmSubsets". See methods for more details.

Value

An object of class "lmSubsets", i.e. a list with the following components:

nobs

Number of observations.

nvar

Number of variables.

weights

Weights vector.

offset

Offset component.

intercept

TRUE if model has intercept term; FALSE otherwise.

include

Included regressors.

exclude

Excluded regressors.

nmin, nmax

Minimum and maximum subset sizes.

tolerance

Tolerance vector.

nbest

Number of best subsets.

df

Degrees of freedom.

rss

Residual sum of squares.

which

Selected variables.

References

Hofmann M, Gatu C, Kontoghiorghes EJ (2007). Efficient Algorithms for Computing the Best Subset Regression Models for Large-Scale Problems. Computational Statistics \& Data Analysis, 52, 16–29.

Gatu C, Kontoghiorghes EJ (2006). Branch-and-Bound Algorithms for Computing the Best Subset Regression Models. Journal of Computational and Graphical Statistics, 15, 139–156.

See Also

lmSelect, summary, methods.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")

#################
## basic usage ##
#################

## canonical example: fit all subsets
all.AirPoll <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 10)

## visualize RSS
plot(all.AirPoll)

## summarize
summary(all.AirPoll)

## forced inclusion/exclusion of variables
all_2.AirPoll <- lmSubsets(all.AirPoll, include = "noncauc",
                                        exclude = "whitecollar")
summary(all_2.AirPoll)

lmSubsets documentation built on May 31, 2017, 3:55 a.m.