knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(bis557)
CASL 2.11 exercises problem number 5:
Consider the simple regression model with only a scalar x and intercept $y= \beta_0+ \beta_1x$
First we have the design matrix
$$\begin{pmatrix}
1& x_1\ 1 & x_2 \ \vdots & \vdots \ 1 & x_n
\end{pmatrix}
$$
And the response vector Y as follows
$$ \begin{pmatrix}
y1 \ y2 \ \vdots \ y_n
\end{pmatrix}$$
For least squares estimator of simple regression model,
we have
$\hat\beta = (X^{T}X)^{-1}X^TY$
Hence we calculate
$$X^TX = \begin{pmatrix} 1& 1 & \dots &1 \ x_1 & x_2 & \dots & x_n \end{pmatrix} \begin{pmatrix} 1& x_1\ 1 & x_2 \ \vdots & \vdots \ 1 & x_n \end{pmatrix} = \begin{pmatrix} n & \sum_{i=1}^{n}x_i \ \sum_{i=1}^{n}x_i& \sum_{i=1}^{n}x_i^2 \end{pmatrix} $$ According to the relationship between adjoint and inverse of a matrix, we have $$(X^TX)^{-1} = \frac{adj(X^TX)}{|X^TX|} = \frac{\begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix}}{n\sum_{i=1}^{n}x_i^2 -(\sum_{i=1}^{n}x_i)^2}
$$
Now we calculate the last part, $$X^TY =\begin{pmatrix} 1& 1 & \dots &1 \ x_1 & x_2 & \dots & x_n \end{pmatrix}\begin{pmatrix} y1 \ y2 \ \vdots \ y_n \end{pmatrix} = \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} $$
Hence subsitute back, we have $$\hat\beta = (X^{T}X)^{-1}X^TY = \frac{\begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix}}{n\sum_{i=1}^{n}x_i^2 -(\sum_{i=1}^{n}x_i)^2} \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} $$ $$=\frac{1}{n\sum_{i=1}^{n}x_i^2} \begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix} \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} =\frac{1}{n\sum_{i=1}^{n}x_i^2} \begin{pmatrix} n\bar{y}\sum x_{i}^2-n\bar{x}\sum x_{i}y_{i}\ -n^2\bar x \bar y + n \sum x_iy_i\ \end{pmatrix} $$ Since $\hat{\beta} = (\beta_0,\beta_1)$
We then have $\hat\beta_0 = \bar{y}-\hat{\beta_1}\bar{x}$ and $\hat\beta_1 = \frac{\sum x_iy_i - \bar{x}\bar{y}}{\sum (x_i-\bar{x})^2}$
To compare the two model's accuracy, we compare the residuals of both models fitting to the same dataset
library(bis557) library(rsample) library(palmerpenguins) data(penguins) library(foreach) # grab residuals from the OLS gradient descent model given data and formula grab_resids <- function(form,data,num_iters,v){ #create cross-validation folds for out-of-sample accuracy v=10 folds<- vfold_cv(data,v=v) y_name<- as.character(form)[2] resids<-foreach(fold=folds$splits,.combine = c) %do% { fit<- lm(form,analysis(fold)) as.vector(as.matrix(assessment(fold)[,y_name],ncol=1))-as.vector(predict(fit,assessment(fold))) } } form = bill_length_mm ~ . data= penguins[,-8] grab_resids(form=form, data = data,num_iters=1000,v=10) gradient_descent_loss(form=form, data = data,alpha=0.1,num_iters=1000,v=10) # The residual from the model adjusted for out-of-sample accuracy is close to that of the OLS gradient descent model, our model works well.
data("penguins") penguins$bill_length_colinear= penguins$bill_length_mm *2 #implement ridge regression function form= body_mass_g~. data = penguins library(bis557) #Implement ridge regression lm_ridge(form=form, penguins, lambda = 0.01)
data("iris") form= Sepal.Length~. lambdas = seq(0,2,by=0.01) lambda_optimizer(form=form,data=iris,v=10,lambdas=lambdas)
$$
\frac{1}{2n} ||Y - X \beta||2^2 + \lambda ||\beta||_1.
$$
Show that if $|X_j^TY| \leq n \lambda$, then $\widehat \beta^{\text{LASSO}}$ must be zero.
For Lasso we have,
$$
\frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1.
$$
Assume that the design matrix X is orthogonal so that no collinearity will arise, such that
$X^TX= ||X||{2}^{2}=I$
Hence $$\frac{1}{2n} ||Y - X \beta||2^2 + \lambda ||\beta||_1
=\frac{1}{2n}(Y^{T} Y+\beta^{T} X^{T} X \beta-2 X^TY \beta)+\lambda|\beta|
= \frac{1}{2n}Y^Ty+ \frac{1}{2n}\sum(\beta{j}^2-2\beta_jX_j^TY +2n\lambda|\beta_j|)$$
Take partial derivative with respect to $\beta_j$ to 0 yields
$$\frac{d (\frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1 )}{d\beta_j} =
2\beta_j -2 X_j^TY +2\lambda n =0
$$ when $\beta_j >0 $
$\hat\beta^{lasso}=\hat\beta_j=X_j^TY -\lambda n >0$ so $X_j^TY >\lambda n $
Since we have restraint condition such that $|X_j^TY| \leq n \lambda$
so in this case we have $|X_j^TY|= \lambda$ to fulfill both conditions.
Hence $\hat\beta^{lasso} =0 $ when $\beta_j >0 $
Likewise,
$$\frac{d (\frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1 )}{d\beta_j} =
2\beta_j -2 X_j^TY -2\lambda n =0
$$ when $\beta_j <0 $
$\hat\beta^{lasso}=\hat\beta_j=X_j^TY + \lambda n <0$ so $X_j^TY >\lambda n $
With the constraint, we still get $\hat\beta^{lasso} =0 $.
Hence if given $|X_j^TY| \leq n \lambda$, then $\widehat \beta^{\text{LASSO}}$ must be zero.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.