In order to perform best subset selection we need to fit a separate regression model to each possible combination of predictors.
The best model is then chosen according to some criteria, typically AIC. Some care must be taken as $R^{2}$ will always increase as we add predictors. In addition a high $R^{2}$, or a low $RSS$ are synonymous with a low training error rate where we are typically interested in a low test error rate. A low training error rate does not necessarily imply a good predictive model.
In addition to computational issues best subset selection can suffer from statistical problems. The larger the search space the higher the probability of finding models that look good based on the training data, even if they have no predictive power.
Forward step--wise selection is a computationally efficient alternative to best subset following a procedure along the lines of algorithm:
Unlike best subset selection which requires $2^{p}$ models to be fit forward step--wise selection requires only $1 + p(p+1)/2$ models. On a model with 20 predictors this means fitting only 211 models, compared to the 1,048,576 in best subset. Be aware that whilst in practice it often does well, it doesn't guarantee to find the best model possible model as not all possible models are evaluated.
Backward step--wise selection, like the forward step--wise algorithm, provides an efficient alternative to best subset. However rather than starting with the null model and adding a single predictor at a time, we start with the full least squares model, containing all the predictors and remove a single predictor at each iteration.
Like forward step--wise it requires that $1+p(p+1)/2$ models are fit however there is an additional requirement that the number of samples $n$ is larger than the number of predictors $p$ so that the full model can be fit.
There are a few methods for stepwise selection for linear regression models such as lmStepAIC
although this sort of feature selection is typically discouraged. Methods that intrinsically perform feature selection such as lasso, elastic net or tree based methods are often preferred.
For models that don't perform a subset selection, caret
has a feature extraction algorithm that is based on cross validation estimates of test RMSE.
library("caret") data(BostonHousing, package = "mlbench") ## an rfeControl object is the analog to trainControl in the ## recursive feature elimination algorithm rc = rfeControl(method = "cv") rfe(medv~., data = BostonHousing, sizes = 4:6, rfeControl = rc, trControl = trainControl(method = "cv"), method = "lm")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.