knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Homework 3

Problem 1: CASL page 117, question 7.

Kernels can also be used as density estimators. Specifically, we have $$f_h (x) = \frac{1}{n} \sum_i K_h (x - x_i).$$ In this setting we see again why it is important to have the integral of the kernel equal to 1. Write a function kern_density that accepts a training vector x, bandwidth h, and test set x_new, returning the kernel density estimate from the Epanechnikov kernel. Visually test how this performs for some hand constructed datasets and bandwidths.

The functions are as follows:

epanech_kern <- function (x, mean, h) {
    X <- (x - mean) / h
    return(mean(0.75 * (1 - X^2) * (abs(X) < 1) / h))
}

kern_density <- function (x, h, x_new) {
    return(sapply(x_new, epanech_kern, x, h))
}

To test these functions, I will use the wait time variable from the faithful data set:

data(faithful)
hist(faithful$waiting, 20)

I will test three different bandwidths --- the rule-of-thumb bandwidth ($h^* = 1.06 \hat{\sigma} n^{- 1/ 5} \approx 4.7$), one that should oversmooth ($h = 6$), and another that should undersmooth ($h = 3$):

X <- faithful$waiting
x <- seq(40, 100, by = 1)
hstar <- 1.06 * sd(X) * length(X)^(-0.2)
hstar

fhat.over <- kern_density(X, 7, x)
fhat.hstar <- kern_density(X, hstar, x)
fhat.under <- kern_density(X, 2, x)

hist(X, 20, freq = FALSE)
lines(x, fhat.over, col = "red")
lines(x, fhat.hstar, col = "green")
lines(x, fhat.under, col = "blue")

The green curve corresponds to the rule-of-thumb bandwidth $h^*$, while the red curve corresponds to $h = 7$ and the blue to $h = 2$. Visually, the behavior of the curves match the predicted over-/undersmoothing. The blue line is very jagged and fluctuates with minor changes in the density. The green line is more resistant to minor changes than the blue line is, but still fluctuates more than the red line does (particularly at peaks around $X = 80$ and $X = 52$).

Problem 2: CASL page 200, question 3

Show that if $f$ and $g$ are both convex functions, then their sum must also be convex.

$f (x)$ and $g (x)$ are convex iff $f^{\prime \prime} (x) \ge 0$ and $g^{\prime \prime} (x) \ge 0$. Their sum $h (x) = f (x) + g (x)$ has second derivative $$h^{\prime \prime} (x) = \frac{d^2}{dx^2} (f (x) + g (x)) = f^{\prime \prime} (x) + g^{\prime \prime} (x) \ge 0$$ since differentiation distributes over addition. Thus, if $f$ and $g$ are convex functions then their sum is also a convex function.

Problem 3: CASL page 200, question 4

Illustrate that the absolute value function is convex. Using the result from the previous exercise, show that the $\ell_1$-norm is also convex.

The first derivative of $f (x) = \lvert x \rvert$ is $f' (x) = \mathrm{sgn} (x)$ (i.e. the sign of $x$) and its second derivative is $\boxed{f^{\prime \prime} (x) = 0}$, making it convex. Thus, the $\ell_1$ norm $\lVert y - \hat{f} \rVert_1 = \sum_i \lvert y_i - \hat{f} (x_i) \rvert$ has second derivative $$\frac{d^2}{dx^2} \lVert y - \hat{f} \rVert_1 = \sum_i \frac{d^2}{dx^2} \lvert y_i - \hat{f} (x_i) \rvert = \sum_i 0 = 0,$$ which means the $\ell_1$ norm is convex.

Problem 4: CASL page 200, question 5

Prove that the elastic net objective function is convex using the results from the previous two exercises.

The elastic net objective function is $$f (b; \lambda, \alpha) = \frac{1}{2n} \lVert y - X b \rVert_2^2 + \frac{\lambda}{2} (1 - \alpha) \lVert b \rVert_2^2 + \lambda \alpha \lVert b \rVert_1.$$ The squared $\ell_2$ norm is convex due to the even exponent, the $\ell_1$ norm is convex as shown in the previous problem, and neither constant will flip the sign of the second derivative (since $\lambda > 0$ and $0 \le \alpha \le 1$). Thus, all three terms of $f$ are convex functions. As per Problem 2, the sum of convex functions is convex, so the elastic net objective function is convex.

Problem 5: CASL page 200, question 6

Find the KKT conditions for glmnet when $0 < \alpha \le 1$ and implement a lasso_reg_with_screening function that takes an $\alpha$ parameter.

According to (7.31), $$\left. \frac{\partial f}{\partial b_l} \right \vert_{b = \tilde{b}} = - \frac{1}{n} \sum_{i = 1}^n x_{il} (y_i - \tilde{y}i^{(l)}) + \sum{i = 1}^n x_{il}^2 \tilde{b}l + \lambda (1 - \alpha) \tilde{b}_l + \lambda \alpha.$$ This is optimal for $b$, denoted $\hat{b}$, when set equal to 0, which yields $$\sum{i = 1}^n x_{il} \left( y_i - \sum_{j = 1}^p x_{ij} \hat{b}_j \right) = \lambda \left( (1 - \alpha) \hat{b}_j + \alpha \right)$$ for each $j = 1, 2, \ldots, n$.

lasso_reg_with_screening <- function (x, y, alpha) {
    ## require(glmnet)
    fit.lasso <- cv.glmnet(x, y, family = "gaussian", alpha = alpha)
}


casxue/bis557 documentation built on May 7, 2019, 5 a.m.