knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(bis557)

1

CASL 2.11 exercises problem number 5: Consider the simple regression model with only a scalar x and intercept $y= \beta_0+ \beta_1x$
First we have the design matrix
$$\begin{pmatrix} 1& x_1\ 1 & x_2 \ \vdots & \vdots \ 1 & x_n \end{pmatrix} $$ And the response vector Y as follows $$ \begin{pmatrix} y1 \ y2 \ \vdots \ y_n \end{pmatrix}$$

For least squares estimator of simple regression model, we have $\hat\beta = (X^{T}X)^{-1}X^TY$
Hence we calculate

$$X^TX = \begin{pmatrix} 1& 1 & \dots &1 \ x_1 & x_2 & \dots & x_n \end{pmatrix} \begin{pmatrix} 1& x_1\ 1 & x_2 \ \vdots & \vdots \ 1 & x_n \end{pmatrix} = \begin{pmatrix} n & \sum_{i=1}^{n}x_i \ \sum_{i=1}^{n}x_i& \sum_{i=1}^{n}x_i^2 \end{pmatrix} $$ According to the relationship between adjoint and inverse of a matrix, we have $$(X^TX)^{-1} = \frac{adj(X^TX)}{|X^TX|} = \frac{\begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix}}{n\sum_{i=1}^{n}x_i^2 -(\sum_{i=1}^{n}x_i)^2}

$$

Now we calculate the last part, $$X^TY =\begin{pmatrix} 1& 1 & \dots &1 \ x_1 & x_2 & \dots & x_n \end{pmatrix}\begin{pmatrix} y1 \ y2 \ \vdots \ y_n \end{pmatrix} = \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} $$

Hence subsitute back, we have $$\hat\beta = (X^{T}X)^{-1}X^TY = \frac{\begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix}}{n\sum_{i=1}^{n}x_i^2 -(\sum_{i=1}^{n}x_i)^2} \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} $$ $$=\frac{1}{n\sum_{i=1}^{n}x_i^2} \begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix} \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} =\frac{1}{n\sum_{i=1}^{n}x_i^2} \begin{pmatrix} n\bar{y}\sum x_{i}^2-n\bar{x}\sum x_{i}y_{i}\ -n^2\bar x \bar y + n \sum x_iy_i\ \end{pmatrix} $$ Since $\hat{\beta} = (\beta_0,\beta_1)$

We then have $\hat\beta_0 = \bar{y}-\hat{\beta_1}\bar{x}$ and $\hat\beta_1 = \frac{\sum x_iy_i - \bar{x}\bar{y}}{\sum (x_i-\bar{x})^2}$

2 Implement a new function fitting the OLS model using gradient descent that calculates the penalty based on the out-of-sample accuracy. Create test code. How does it compare to the OLS model?

To compare the two model's accuracy, we compare the residuals of both models fitting to the same dataset

library(bis557)
library(rsample)
library(palmerpenguins)
data(penguins)
library(foreach)
# grab residuals from the OLS gradient descent model given data and formula
  grab_resids <- function(form,data,num_iters,v){
  #create cross-validation folds for out-of-sample accuracy
  v=10
  folds<- vfold_cv(data,v=v)
  y_name<- as.character(form)[2]
  resids<-foreach(fold=folds$splits,.combine = c) %do% {
    fit<- lm(form,analysis(fold))
    as.vector(as.matrix(assessment(fold)[,y_name],ncol=1))-as.vector(predict(fit,assessment(fold)))
  }

}
form = bill_length_mm ~ .
data= penguins[,-8]

grab_resids(form=form, data = data,num_iters=1000,v=10)  
gradient_descent_loss(form=form, data = data,alpha=0.1,num_iters=1000,v=10)

# The residual from the model adjusted for out-of-sample accuracy is close to that of the OLS gradient descent model, our model works well.

3. Implement a ridge regression function taking into account colinear (or nearly colinear) regression variables.

data("penguins")
penguins$bill_length_colinear= penguins$bill_length_mm *2
#implement ridge regression function 
form= body_mass_g~.
data = penguins
library(bis557)
#Implement ridge regression
lm_ridge(form=form, penguins, lambda = 0.01)

4. Implement your own method and testing for optimizing the ridge parameter $\lambda$.

data("iris")

form= Sepal.Length~.
lambdas = seq(0,2,by=0.01)
lambda_optimizer(form=form,data=iris,v=10,lambdas=lambdas)

5 Consider the LASSO penalty

$$ \frac{1}{2n} ||Y - X \beta||2^2 + \lambda ||\beta||_1. $$ Show that if $|X_j^TY| \leq n \lambda$, then $\widehat \beta^{\text{LASSO}}$ must be zero. For Lasso we have, $$ \frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1. $$ Assume that the design matrix X is orthogonal so that no collinearity will arise, such that $X^TX= ||X||{2}^{2}=I$
Hence $$\frac{1}{2n} ||Y - X \beta||2^2 + \lambda ||\beta||_1 =\frac{1}{2n}(Y^{T} Y+\beta^{T} X^{T} X \beta-2 X^TY \beta)+\lambda|\beta| = \frac{1}{2n}Y^Ty+ \frac{1}{2n}\sum(\beta{j}^2-2\beta_jX_j^TY +2n\lambda|\beta_j|)$$ Take partial derivative with respect to $\beta_j$ to 0 yields $$\frac{d (\frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1 )}{d\beta_j} = 2\beta_j -2 X_j^TY +2\lambda n =0 $$ when $\beta_j >0 $ $\hat\beta^{lasso}=\hat\beta_j=X_j^TY -\lambda n >0$ so $X_j^TY >\lambda n $ Since we have restraint condition such that $|X_j^TY| \leq n \lambda$ so in this case we have $|X_j^TY|= \lambda$ to fulfill both conditions. Hence $\hat\beta^{lasso} =0 $ when $\beta_j >0 $

Likewise, $$\frac{d (\frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1 )}{d\beta_j} = 2\beta_j -2 X_j^TY -2\lambda n =0 $$ when $\beta_j <0 $
$\hat\beta^{lasso}=\hat\beta_j=X_j^TY + \lambda n <0$ so $X_j^TY >\lambda n $ With the constraint, we still get $\hat\beta^{lasso} =0 $. Hence if given $|X_j^TY| \leq n \lambda$, then $\widehat \beta^{\text{LASSO}}$ must be zero.



nixgank-wang/bis557 documentation built on Dec. 26, 2020, 9:54 p.m.