In casxue/bis557: Coursework for BIS 557

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Homework 2: Ridge Regression

Observing the effect on out-of-sample mean square error as $\lambda$ varies

Solving for the ridge regression coefficients depends on the value of the tuning parameter $\lambda$. Since we cannot directly assess how good of a model is fitted for a given value of $\lambda$, we can instead look at how the mean squared error (MSE) of models trained and tested on separate sets vary.

Data sets

The data sets used will be ridge_train for training the models and ridge_test for testing them. Each table consists of one response variable and four covariates, with 200 rows each.

library(bis557)
data(ridge_train)
data(ridge_test)

Fitting models

I chose to fit a model for integer values of $\lambda$ from 15 to 100, inclusive. This was accomplished by running the ridge_reg() function in parallel using the foreach command.

lambda <- seq(15, 150, by = 1)
n <- nrow(ridge_test)
form <- y ~ . - 1
y <- ridge_test[,1]
X <- ridge_test[,-1] 


library(foreach)
library(doParallel)
cl <- parallel::makeForkCluster(2)
doParallel::registerDoParallel(cl)

mse <- foreach (l = lambda, .combine = "c") %dopar% {
    beta_hat <- ridge_reg(form, l, ridge_train)$coef
    yhat <- as.matrix(X) %*% beta_hat
    err <- sum((y - yhat)^2) / n
    return(err)
}

parallel::stopCluster(cl)

The mean squared error of the fitted model varies with $\lambda$ as follows:

plot(lambda, mse, main = "Selecting Lambda via Cross-Validation", log = "x", xlab = "Value of lambda", ylab = "MSE")

Observations

In the plot above, we see that there exists some value $\lambda^*$ that appears to minimize the out-of-sample MSE of the fitted model. We can approximate this value with $\hat{\lambda}$:

lambda_hat <- lambda[which(mse == min(mse))]
lambda_hat

If we zoom in, this trend holds and we can also obtain a more precise estimate for $\lambda^*$

lambda <- seq(28, 35, by = 0.05)

cl <- parallel::makeForkCluster(2)
doParallel::registerDoParallel(cl)

mse <- foreach (l = lambda, .combine = "c") %dopar% {
    beta_hat <- ridge_reg(form, l, ridge_train)$coef
    yhat <- as.matrix(X) %*% beta_hat
    err <- sum((y - yhat)^2) / n
    return(err)
}

parallel::stopCluster(cl)


plot(lambda, mse, main = "Selecting Lambda via Cross-Validation", log = "x", xlab = "Value of lambda", ylab = "MSE")
lambda_hat <- lambda[which(mse == min(mse))]
lambda_hat