knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(CompStat)
Iterative methods for solving $Ax=b$ have requirements on $A$. The spectral norm of a matrix based on $A$ will determine whether the convergence can be achieved, which will be printed. If the spectral norm is larger than 1, the $\texttt{solve_ols()}$ will not execute but return $\texttt{NULL}$ instead. The smaller that printed spectral norm is, the faster the convergence will be. Examples in hw1 are given below:
n = 100 A.list = list() for (k in 1:3){ A.list[[k]] = matrix(0, nrow = n, ncol = n) diag(A.list[[k]]) = rep(k, n) for (i in 1:n){ for (j in 1:n){ if(abs(i-j)==1) A.list[[k]][i,j] = -1 } } } v = rep(c(1,0), n/2) est_jacobi_seq = list() est_GaussSeidel = list() for(i in 1:3){ est_GaussSeidel[[i]] = solve_ols(A.list[[i]], b = A.list[[i]]%*%v, x0 = rep(0,n), method = 'Gauss-Seidel') est_jacobi_seq[[i]] = solve_ols(A.list[[i]], b = A.list[[i]]%*%v, x0 = rep(0,n), method = 'Jacobi') }
Return from $\texttt{solve_ols()}$ records all solutions and time spent at all iterations, thus they can be used to easily plot the relative errors for first $10000$ iterations against number of iterations and runtime.
Jacobi(parallel) is not recommended. It is very slow compared to Jacobi(sequential) because of several reasons. First, Jacobi(sequential) can write the update rule in matrix form to fasten computation. Second, the computation for each dimension in each iteration is small. Setting up parallelization environment itself can be longer. Only when communication among different clusters is conducted will Jacobi(parallel) be useful, which is not achieved here.
For illustrations, the relative erros for first $10000$ iterations against number of iterations are plotted here, using both Gauss-Seidel and Jacobi(sequential).
for (i in 2:3){ # jacobi sequential error = sapply(est_jacobi_seq[[i]]$x.all, function(x){sqrt(sum((x-v)^2))/sqrt(sum(v^2))}) l = min(length(error), 10000) plot(y = error[1:l], x = 1:l, ylab = "relative error", xlab = "iteration", main = paste("Jacobi: alpha =", i)) # Gauss-Seidel error = sapply(est_GaussSeidel[[i]]$x.all, function(x){sqrt(sum((x-v)^2))/sqrt(sum(v^2))}) l = min(length(error), 10000) plot(y = error[1:l], x = 1:l, ylab = "relative error", xlab = "iteration", main = paste("Gauss-Seidel: alpha =", i)) }
The subsample size $r$ will influence the performance of leveraging methods for both uniform sampling and leverage score based sampling. When $r$ is small, leverage score based sampling usually outperform uniform sampling. Examples are given below for $r=10$ and $r=100$. The number of draws is $500$.
x = rt(n=500, df=6) y = -x + rnorm(500) lm.ols = lm(y ~ x) r_seq = c(10,100) est_leverage = list() for (i in 1:2) est_leverage[[i]] = algo_average(x,y,r=r_seq[i])
Box plots of $|\tilde{\beta}{unif}-\hat{\beta}|$ and $|\tilde{\beta}{unif}-\hat{\beta}|$ are plotted below, where $\hat{\beta}$ is the least square solution.
for (i in 1:2){ dist = apply(est_leverage[[i]]$beta_unif, 1, function(x){abs(x[2]-lm.ols$coefficients[2])}) boxplot(dist, main = paste("beta_unif", ' r = ', r_seq[i]), ylab = 'difference') dist = apply(est_leverage[[i]]$beta_blev, 1, function(x){abs(x[2]-lm.ols$coefficients[2])}) boxplot(dist, main = paste("beta_blev", ' r = ', r_seq[i]), ylab = 'difference') }
The scatterplots for a single draw which mark the subsamples and estimated line are also plotted below.
for (i in 1:2){ plot(x,y, main = paste("unif sampling: r =", r_seq[i])) abline(lm(y ~ x)) sample = sample(1:500, r_seq[i]) points(x = x[sample], y = y[sample], col = 'green', pch = 4) abline(lm(y[sample] ~ x[sample]), col = 'green', lty = 2) plot(x,y, main = paste("weighted sampling: r =", r_seq[i])) abline(lm(y ~ x)) hii = diag(matrix(x, nrow = 500) %*% matrix(x, nrow = 1)/sum(x^2)) hii = hii/sum(hii) sample = sample(1:500, r_seq[i], prob = hii) w = 1/hii[sample] w = w/sum(w) points(x = x[sample], y = y[sample], col = 'red', pch = 4) abline(lm(y[sample] ~ x[sample], weights = w), col = 'red', lty = 2) }
The solution path fitted by coordinate descent for elastic net are plotted before for $n=50$ and $\alpha=0,0.5,1$, respectively. The left side is my implementation, while the right side is the implementation from \texttt{glmnet}.
library(mvtnorm) library(glmnet) beta_true = rep(0, 20) beta_true[1] = 2 beta_true[3] = -2 beta_true[5] = 1 beta_true[7] = -1 corr = diag(20) corr[1,2] = 0.8 corr[5,6] = 0.8 corr = corr + t(corr) - diag(20) N = 50 alpha_seq = c(0,0.5,1) X_obs = rmvnorm(50, sigma = corr) y_obs = X_obs %*% beta_true + rnorm(50) for (j in 1:3){ est = elnet_coord(X = X_obs, y = y_obs, alpha = alpha_seq[j]) plotpath(fit=est, x = X_obs) est1 = glmnet(x = X_obs, y = y_obs, alpha = alpha_seq[j]) plot(est1, xvar = "lambda") }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.