knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(linreg) library(caret) library(mlbench)
The updated version of the linreg
package provides a new function called ridgereg()
that performs ridge regression. We are delighted to present the ridgereg
function in this vignette and now we will show its functionality by an example. In the following example, we will create a predictive model for the BostonHousing
data in the mlbench
package. For the model training process we will use the caret
package. Both packages are loaded in our workspace.
Now let's get started!
First, the BostonHousing
data will be divided into a training set and a test set by using the
caret
package, where $80\%$ will be for training, and the remaining $20\%$ for testing. The response variable is medv
(median value of owner-occupied homes in USD 1000's).
data("BostonHousing") colnames(BostonHousing) <- make.names(colnames(BostonHousing)) set.seed(123) trainIndex <- createDataPartition(BostonHousing$medv, p = .8, list = FALSE, times = 1) train_data <- BostonHousing[ trainIndex,] colnames(train_data) <- make.names(colnames(train_data)) test_data <- BostonHousing[-trainIndex,] print(length(test_data[,1]))
After dividing the data into a training and a test set, we will now fit two linear regression models to the BostonHousing
training data using the caret
package - one model consists of all continuous variables as predictors, and one model with only significant continuous variables using forward selection. Another package called leaps
is required in order to perform forward selection.
library(leaps) # Linear regression lin <- train(medv ~ crim + zn + age + indus + chas + nox + rm + dis + rad + tax + ptratio + b + lstat + rm:lstat, data = train_data, method = "lm") # Linear Regression with Forward Selection lin_forward <- train(medv ~ crim + zn + age + indus + chas + nox + rm + dis + rad + tax + ptratio + b + lstat + rm:lstat, data = train_data, method = "leapForward", tuneGrid = data.frame(nvmax = 1:(ncol(train_data)-1)))
Then we evaluate the performance of the two linear regression models on the training data, and we can notice that they obtained similar estimations of the root mean squared error (RMSE), $R^2$, and the mean absolute error (MAE).
lin_pred <- predict(lin, train_data) postResample(pred = lin_pred, obs = train_data$medv)
lin_forward_pred <- predict(lin_forward , train_data) postResample(pred = lin_forward_pred, obs = train_data$medv)
Further, we will fit a ridge regression model by using the ridgereg
function, for different values of $\lambda$.
A 10-fold cross-validation on the training set will be used in order to find the best hyperparameter value for $\lambda$.
# Include own model in train() # PLEASE FIND WHAT'S WRONG modelInfo <- list( label = "Hyperparameter", library = "linreg", type = "Regression", parameters = data.frame( parameter = "lambda", class = "numeric", label = "Hyperparameter" ), grid = function(x, y, len = NULL, search = "grid") { grid <- expand.grid(lambda = seq(0, 3, length = 10)) }, fit = function(x, y, wts, param, lev, last, classProbs, ...) { ## ridgereg requires a data frame with predictors and response dat <- if (is.data.frame(x)) x else as.data.frame(x) dat$medv <- y mod <- ridgereg( formula = (medv ~ crim + zn + age + indus + chas + nox + rm + dis + rad + tax + ptratio + b + lstat), data = dat, lambda = param$lambda ) }, predict = function(modelFit, newdata, submodels = NULL) { print(modelFit) newdata <- as.data.frame(newdata) predict(modelFit, newdata) }, loop = NULL, prob = NULL, levels = NULL ) set.seed(123) rr <- train( x = train_data[, names(train_data) != "medv"], y = train_data$medv, data = train_data, method = modelInfo, trControl = trainControl(method = "repeatedcv", repeats = 10) ) rr
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.