lmSelect_fit: Best-subset regression
In lmSubsets: Exact Variable-Subset Selection in Linear Regression

Description Usage Arguments Details Value References See Also Examples

Low-level interface to best-variable-subset selection in ordinary linear regression.

1
2
3

lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL,
             exclude = NULL, penalty = "BIC", tolerance = 0,
             nbest = 1, ..., pradius = NULL)

`x`	`double[,]`—the model matrix
`y`	`double[]`—the model response
`weights`	`double[]`—the model weights
`offset`	`double[]`—the model offset
`include`	`logical[]`, `integer[]`, `character[]`—the regressors to force in
`exclude`	`logical[]`, `integer[]`, `character[]`—the regressors to force out
`penalty`	`double`, `character`, `"function"`—the penalty per model parameter
`tolerance`	`double`—the approximation tolerance
`nbest`	`integer`—the number of best subsets
`...`	ignored
`pradius`	`integer`—the preordering radius

The best variable-subset model is determined, where the "best" model is the one with the lowest information criterion value. The information criterion belongs to the AIC family.

The regression data is specified with the x, y, weights, and offset parameters. See lm.fit() for further details.

To force regressors into or out of the regression, a list of regressors can be passed as an argument to the include or exclude parameters, respectively.

The information criterion is specified with the penalty parameter. Accepted values are "AIC", "BIC", or a "numeric" value representing the penalty-per-model-parameter. A custom selection criterion may be specified by passing an R function as an argument. The expected signature is function (size, rss), where size is the number of predictors (including the intercept, if any), and rss is the residual sum of squares. The function must be non-decreasing in both parameters.

An approximation tolerance can be specified to speed up the search.

The number of returned submodels is determined by the nbest parameter.

The preordering radius is given with the pradius parameter.

A list with the following components:

`NOBS`	`integer`—number of observations in model (before `weights` processing)
`nobs`	`integer`—number of observations in model (after `weights` processing)
`nvar`	`integer`—number of regressors in model
`weights`	`double[]`—model weights
`intercept`	`logical`—is `TRUE` if model contains an intercept term, `FALSE` otherwise
`include`	`logical[]`—regressors forced into the regression
`exclude`	`logical[]`—regressors forced out of the regression
`size`	`integer[]`—subset sizes
`ic`	information criterion
`tolerance`	`double`—approximation tolerance
`nbest`	`integer`—number of best subsets
`submodel`	`"data.frame"`—submodel information
`subset`	`"data.frame"`—selected subsets

Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact variable-subset selection in linear regression for R. Journal of Statistical Software, 93, 1–21. doi: 10.18637/jss.v093.i03.

lmSelect() for the high-level interface
lmSubsets_fit() for all-subsets regression

data("AirPollution", package = "lmSubsets")

x <- as.matrix(AirPollution[, names(AirPollution) != "mortality"])
y <-           AirPollution[, names(AirPollution) == "mortality"]

f <- lmSelect_fit(x, y)
f