Description Usage Arguments Details Value References See Also Examples
All-subsets regression for ordinary linear models.
1 2 3 4 5 6 7 8 9 10 11 12 | mcsSubset(object, ...)
## S3 method for class 'formula'
mcsSubset(formula, ..., lm = FALSE)
## S3 method for class 'lm'
mcsSubset(object, ..., penalty = 0)
## Default S3 method:
mcsSubset(object, y, include = NULL, exclude = NULL,
size = NULL, penalty = 0, tolerance = 0, pradius = NULL, nbest = 1,
..., .algo = "hbba")
|
formula, object |
An object of class |
y |
The response variable. |
include, exclude |
Index vectors designating variables that are
forced in or out of the model, respectively. The vectors may
consist of (integer) indexes, (character) names, or (logical) bits
selecting the desired columns. The integer indexes correspond to
the position of the variables in the model matrix; the intercept, if
any, has index |
size |
Vector of subset sizes (not counting the intercept, if any).
By default, the best subsets are computed for each subset size (as
determined by way of |
penalty |
Penalty per parameter (see |
tolerance |
If |
pradius |
Preordering radius. |
nbest |
Number of best subsets to report. |
... |
Ignored. |
lm |
If |
.algo |
Internal use. |
The function mcsSubset
computes all variable-subsets regression
for ordinary linear models. The function is generic and provides
various methods to conveniently specify the regressor and response
variables. The standard formula
interface (see
lm
) can be used, or the information can be
extracted from an already fitted lm
object. The regressor
matrix and response variable can also be passed in directly.
By default (i.e. penalty == 0
), the method computes the
nbest
best subset models for every subset size, where the
"best" models are the models with the lowest residual sum of squares
(RSS). The scope of the search can be limited to certain subset sizes
by setting size
. A tolerance vector (expanded if necessary)
may be specified to speed up the algorithm, where tolerance[n]
is the tolerance applied to subset models of size n
.
Alternatively (penalty > 0
), the overall (i.e. over all sizes)
nbest
best subset models may be computed according to an
information criterion of the AIC family. A single tolerance value may
be specified to speed up the search.
By way of include
and exclude
, variables may be forced
into or out of the regression, respectively.
The function will preorder the variables to reduce execution time if
pradius > 0
. Good execution times are usually attained for
approximately pradius = n/3
(default value), where n
is
the number of regressors after evaluation include
and
exclude
.
A set of standard extractor functions for fitted model objects is
available for objects of class "mcsSubset"
. See
methods
for more details.
An object of class "mcsSubset"
, i.e. a list
with the following components:
weights |
Weights. |
offset |
Offset. |
nobs |
Number of observations. |
nvar |
Number of variables (not including intercept, if any). |
x.names |
Names of all design variables. |
y.name |
Name of response variable. |
include |
Indexes of variables forced in. |
exclude |
Indexes of variables forced out. |
intercept |
|
penalty |
AIC penalty. |
nbest |
Number of best subsets. |
When penalty == 0
:
size |
Subset sizes. |
tolerance |
Tolerance vector. |
rss |
A two dimensional numeric |
which |
A three dimensional logical |
The entry rss[i, n]
corresponds to the RSS of the i
-th
best subset model of size n
. The entry which[j, i, n]
has value TRUE
if the i
-th best subset model of size
n
contains the j
-th variable.
When penalty != 0
:
tolerance |
Tolerance value. |
rss |
A one dimensional numeric array of length |
aic |
A one dimensional numeric array of length |
which |
A two dimensional logical |
The entries rss[i]
and aic[i]
correspond to the RSS and
AIC of the i
-th best subset model, respectively. The entry
which[j, i]
is TRUE
if the i
-th best subset model
contains variable j
.
Hofmann, M. and Gatu, C. and Kontoghiorghes, E. J. (2007). Efficient Algorithms for Computing the Best Subset Regression Models for Large-Scale Problems. Computational Statistics \& Data Analysis, 52, 16–29.
Gatu, C. and Kontoghiorghes, E. J. (2006). Branch-and-Bound Algorithms for Computing the Best Subset Regression Models. Journal of Computational and Graphical Statistics, 15, 139–156.
summary
, methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | ## load data (with logs for relative potentials)
data("AirPollution", package = "mcsSubset")
#################
## basic usage ##
#################
## canonical example: fit best subsets
xs <- mcsSubset(mortality ~ ., data = AirPollution)
## visualize RSS
plot(xs)
## summarize
summary(xs)
## plot summary
plot(summary(xs))
## forced inclusion/exclusion of variables
xs <- mcsSubset(mortality ~ ., data = AirPollution,
include = "noncauc", exclude = "whitecollar")
## or equivalently
xs <- mcsSubset(mortality ~ ., data = AirPollution,
include = 10, exclude = 11)
summary(xs)
##########################
## find best BIC models ##
##########################
## find 10 best subset models
xs <- mcsSubset(mortality ~ ., data = AirPollution,
penalty = "BIC", nbest = 10)
## summarize
summary(xs)
## visualize BIC and RSS
plot(summary(xs))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.