knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(bis557)
This is the 2nd homework of Yale course bis557 (due October 7th).
For OLS, we know that $\hat \beta = (X^TX)^{-1}X^TY$.
Rewrite $y=\beta_0 + \beta_1x$ as $Y=X\beta$, where
$$
Y =
\left[
\begin{matrix}
y_1 \
y_2 \
\cdots \
y_n \
\end{matrix}
\right]
\hspace{1cm}
X = \left[ \begin{matrix} 1 & x_{1} \ 1 & x_{2} \ \cdots & \cdots \ 1 & x_{n} \ \end{matrix} \right]
\hspace{1cm}
\beta = \left[ \begin{matrix} \beta_{0} \ \beta_{1} \ \end{matrix} \right] $$ Then we can calculate $\hat \beta$:
$$ X^TX = \left[ \begin{matrix} 1 & \cdots &1 \ x_1 & \cdots & x_n \end{matrix} \right]
\left[ \begin{matrix} 1 & x_1 \ \cdots & \cdots \ 1 & x_n \end{matrix} \right]
=
\left[ \begin{matrix} n & \sum{x_i} \ \sum{x_i} & \sum{x_i^2} \end{matrix} \right]
\
(X^TX)^{-1} =
\frac{1}{n\sum{x_i^2}-(\sum{x_i})^2}
\left[ \begin{matrix} \sum{x_i^2} & -\sum{x_i} \ -\sum{x_i} & n \end{matrix} \right] $$ Let $\alpha$ represent $\frac{1}{n\sum{x_i^2}-(\sum{x_i})^2}$,
$$ \begin{aligned}
(X^TX)^{-1}X^T &= \alpha
\left[ \begin{matrix} \sum{x_i^2} & -\sum{x_i} \ -\sum{x_i} & n \end{matrix} \right] \left[ \begin{matrix} 1 & \cdots &1 \ x_1 & \cdots & x_n \end{matrix} \right]
\
&= \alpha
\left[ \begin{matrix} \sum{x_i^2}-(\sum{x_i})x_1 & \cdots & \sum{x_i^2}-(\sum{x_i})x_n \ -\sum{x_i}+nx_1 & \cdots & -\sum{x_i}+nx_n \end{matrix} \right]
\end{aligned} $$
$$ \begin{aligned} \beta &= (X^TX)^{-1}X^TY = \alpha
\left[ \begin{matrix} \sum{x_i^2}-(\sum{x_i})x_1 & \cdots & \sum{x_i^2}-(\sum{x_i})x_n \ -\sum{x_i}+nx_1 & \cdots & -\sum{x_i}+nx_n \end{matrix} \right]
\left[ \begin{matrix} y_1 \ y_2 \ \cdots \ y_n \ \end{matrix} \right]
\
&= \frac{1}{n\sum{x_i^2}-(\sum{x_i})^2} \left[ \begin{matrix} \sum{x_i^2}\sum{y_i}-\sum{x_i y_i}\sum{x_i} \ -\sum{x_i} \sum{y_i} + n \sum{x_i y_i} \end{matrix} \right]
\end{aligned} $$
data("iris") gd <- gradient_descent_new(Sepal.Length ~ ., iris)$coefficients lm <- lm(Sepal.Length ~ ., iris)$coefficients compare <- as.data.frame(cbind(gd, lm)) colnames(compare) <- c("gradient descent", "OLS") compare
data("iris") # create a colinear term iris$colinear <- 2*iris$Petal.Width # run the ridge regression function - it works without error ridge_regression(Sepal.Length ~ ., iris, lambda = 0.1)
data("iris") # run the ridge regression function with cross validation cv_ridge(form = Sepal.Length ~ ., d=iris, lambda = seq(0, 0.05 ,0.001))
The best lambda is 0.01.
$$ f(\beta)=\frac{1}{2 n} \sum_{i=1}^{n}\left(y_{i}-\sum_{j=1}^{p} x_{i j} \beta_{j}\right)^{2}+\lambda \sum_{j=1}^{p}|\beta_{j}| $$ $$ \frac{\partial f(\beta)}{\partial \beta_l} = -\frac{1}{n} \sum ^n {i=1}x{il}(y_i - \tilde y ^{(l)}) +\sum ^n {i=1} x{il} \tilde {\beta_l} + \lambda \alpha
$$
$$ \tilde{\beta}l = \frac{1}{n} \sum^n{i=1} x_{il} (y_i -\tilde{y}^{(l)}) - \lambda $$
Rewrite into matrix notation:
$$ \tilde{\beta}_l = \frac{1}{n} X^T_j Y - \lambda $$ If $|X^T_j| \le n\lambda$, then $\frac{1}{n} X^T_j Y - \lambda \le 0$, which means $\tilde \beta_l \le 0$. In Lasso, coefficient shrink to 0 when the absolutevalue of that least squares coefficient is less than $\lambda/2$. Therefore, when $\tilde \beta_l \le 0$, $\tilde \beta_l$ must be set to $0$.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.