In nixgank-wang/bis557: What the Package Does (One Line, Title Case)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(bis557)

1

CASL 2.11 exercises problem number 5: Consider the simple regression model with only a scalar x and intercept $y= \beta_0+ \beta_1x$
First we have the design matrix
$$\begin{pmatrix} 1& x_1\ 1 & x_2 \ \vdots & \vdots \ 1 & x_n \end{pmatrix} $$ And the response vector Y as follows $$ \begin{pmatrix} y1 \ y2 \ \vdots \ y_n \end{pmatrix}$$

For least squares estimator of simple regression model, we have $\hat\beta = (X^{T}X)^{-1}X^TY$
Hence we calculate

$$X^TX = \begin{pmatrix} 1& 1 & \dots &1 \ x_1 & x_2 & \dots & x_n \end{pmatrix} \begin{pmatrix} 1& x_1\ 1 & x_2 \ \vdots & \vdots \ 1 & x_n \end{pmatrix} = \begin{pmatrix} n & \sum_{i=1}^{n}x_i \ \sum_{i=1}^{n}x_i& \sum_{i=1}^{n}x_i^2 \end{pmatrix} $$ According to the relationship between adjoint and inverse of a matrix, we have $$(X^TX)^{-1} = \frac{adj(X^TX)}{|X^TX|} = \frac{\begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix}}{n\sum_{i=1}^{n}x_i^2 -(\sum_{i=1}^{n}x_i)^2}

Now we calculate the last part, $$X^TY =\begin{pmatrix} 1& 1 & \dots &1 \ x_1 & x_2 & \dots & x_n \end{pmatrix}\begin{pmatrix} y1 \ y2 \ \vdots \ y_n \end{pmatrix} = \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} $$

Hence subsitute back, we have $$\hat\beta = (X^{T}X)^{-1}X^TY = \frac{\begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix}}{n\sum_{i=1}^{n}x_i^2 -(\sum_{i=1}^{n}x_i)^2} \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} $$ $$=\frac{1}{n\sum_{i=1}^{n}x_i^2} \begin{pmatrix} \sum_{i=1}^{n}x_i^2 & - \sum_{i=1}^{n}x_i \ - \sum_{i=1}^{n}x_i & n \ \end{pmatrix} \begin{pmatrix} \sum_{i=1}^{n}y_i \ \sum_{i=1}^{n}x_iy_i \end{pmatrix} =\frac{1}{n\sum_{i=1}^{n}x_i^2} \begin{pmatrix} n\bar{y}\sum x_{i}^2-n\bar{x}\sum x_{i}y_{i}\ -n^2\bar x \bar y + n \sum x_iy_i\ \end{pmatrix} $$ Since $\hat{\beta} = (\beta_0,\beta_1)$

We then have $\hat\beta_0 = \bar{y}-\hat{\beta_1}\bar{x}$ and $\hat\beta_1 = \frac{\sum x_iy_i - \bar{x}\bar{y}}{\sum (x_i-\bar{x})^2}$

2 Implement a new function fitting the OLS model using gradient descent that calculates the penalty based on the out-of-sample accuracy. Create test code. How does it compare to the OLS model?

To compare the two model's accuracy, we compare the residuals of both models fitting to the same dataset

library(bis557)
library(rsample)
library(palmerpenguins)
data(penguins)
library(foreach)
# grab residuals from the OLS gradient descent model given data and formula
  grab_resids <- function(form,data,num_iters,v){
  #create cross-validation folds for out-of-sample accuracy
  v=10
  folds<- vfold_cv(data,v=v)
  y_name<- as.character(form)[2]
  resids<-foreach(fold=folds$splits,.combine = c) %do% {
    fit<- lm(form,analysis(fold))
    as.vector(as.matrix(assessment(fold)[,y_name],ncol=1))-as.vector(predict(fit,assessment(fold)))
  }

}
form = bill_length_mm ~ .
data= penguins[,-8]

grab_resids(form=form, data = data,num_iters=1000,v=10)  
gradient_descent_loss(form=form, data = data,alpha=0.1,num_iters=1000,v=10)

# The residual from the model adjusted for out-of-sample accuracy is close to that of the OLS gradient descent model, our model works well.

3. Implement a ridge regression function taking into account colinear (or nearly colinear) regression variables.

data("penguins")
penguins$bill_length_colinear= penguins$bill_length_mm *2
#implement ridge regression function 
form= body_mass_g~.
data = penguins
library(bis557)
#Implement ridge regression
lm_ridge(form=form, penguins, lambda = 0.01)

4. Implement your own method and testing for optimizing the ridge parameter $\lambda$.

data("iris")

form= Sepal.Length~.
lambdas = seq(0,2,by=0.01)
lambda_optimizer(form=form,data=iris,v=10,lambdas=lambdas)

5 Consider the LASSO penalty

$$ \frac{1}{2n} ||Y - X \beta||2^2 + \lambda ||\beta||_1. $$ Show that if $|X_j^TY| \leq n \lambda$, then $\widehat \beta^{\text{LASSO}}$ must be zero. For Lasso we have, $$ \frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1. $$ Assume that the design matrix X is orthogonal so that no collinearity will arise, such that $X^TX= ||X||{2}^{2}=I$
Hence $$\frac{1}{2n} ||Y - X \beta||2^2 + \lambda ||\beta||_1 =\frac{1}{2n}(Y^{T} Y+\beta^{T} X^{T} X \beta-2 X^TY \beta)+\lambda|\beta| = \frac{1}{2n}Y^Ty+ \frac{1}{2n}\sum(\beta{j}^2-2\beta_jX_j^TY +2n\lambda|\beta_j|)$$ Take partial derivative with respect to $\beta_j$ to 0 yields $$\frac{d (\frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1 )}{d\beta_j} = 2\beta_j -2 X_j^TY +2\lambda n =0 $$ when $\beta_j >0 $ $\hat\beta^{lasso}=\hat\beta_j=X_j^TY -\lambda n >0$ so $X_j^TY >\lambda n $ Since we have restraint condition such that $|X_j^TY| \leq n \lambda$ so in this case we have $|X_j^TY|= \lambda$ to fulfill both conditions. Hence $\hat\beta^{lasso} =0 $ when $\beta_j >0 $

Likewise, $$\frac{d (\frac{1}{2n} ||Y - X \beta||_2^2 + \lambda ||\beta||_1 )}{d\beta_j} = 2\beta_j -2 X_j^TY -2\lambda n =0 $$ when $\beta_j <0 $
$\hat\beta^{lasso}=\hat\beta_j=X_j^TY + \lambda n <0$ so $X_j^TY >\lambda n $ With the constraint, we still get $\hat\beta^{lasso} =0 $. Hence if given $|X_j^TY| \leq n \lambda$, then $\widehat \beta^{\text{LASSO}}$ must be zero.

nixgank-wang/bis557 documentation built on Dec. 26, 2020, 9:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

nixgank-wang/bis557
What the Package Does (One Line, Title Case)

In nixgank-wang/bis557: What the Package Does (One Line, Title Case)

1

2 Implement a new function fitting the OLS model using gradient descent that calculates the penalty based on the out-of-sample accuracy. Create test code. How does it compare to the OLS model?

3. Implement a ridge regression function taking into account colinear (or nearly colinear) regression variables.

4. Implement your own method and testing for optimizing the ridge parameter $\lambda$.

5 Consider the LASSO penalty

R Package Documentation

Browse R Packages

We want your feedback!

nixgank-wang/bis557 What the Package Does (One Line, Title Case)

In nixgank-wang/bis557: What the Package Does (One Line, Title Case)

1

2 Implement a new function fitting the OLS model using gradient descent that calculates the penalty based on the out-of-sample accuracy. Create test code. How does it compare to the OLS model?

3. Implement a ridge regression function taking into account colinear (or nearly colinear) regression variables.

4. Implement your own method and testing for optimizing the ridge parameter $\lambda$.

5 Consider the LASSO penalty

R Package Documentation

Browse R Packages

We want your feedback!

nixgank-wang/bis557
What the Package Does (One Line, Title Case)