In yuxuanzhao2295/sci: R package for hw2 STSCI6520

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(CompStat)

Iterative Methods for Solving Linear System of Equations

Iterative methods for solving $Ax=b$ have requirements on $A$. The spectral norm of a matrix based on $A$ will determine whether the convergence can be achieved, which will be printed. If the spectral norm is larger than 1, the $\texttt{solve_ols()}$ will not execute but return $\texttt{NULL}$ instead. The smaller that printed spectral norm is, the faster the convergence will be. Examples in hw1 are given below:

n = 100
A.list = list()
for (k in 1:3){
  A.list[[k]] = matrix(0, nrow = n, ncol = n)
  diag(A.list[[k]]) = rep(k, n)
  for (i in 1:n){
    for (j in 1:n){
      if(abs(i-j)==1) A.list[[k]][i,j] = -1
    }
  }
}

v = rep(c(1,0), n/2)
est_jacobi_seq = list()
est_GaussSeidel = list()
for(i in 1:3){
  est_GaussSeidel[[i]] = solve_ols(A.list[[i]], b = A.list[[i]]%*%v, x0 = rep(0,n), method = 'Gauss-Seidel')
  est_jacobi_seq[[i]] = solve_ols(A.list[[i]], b = A.list[[i]]%*%v, x0 = rep(0,n), method = 'Jacobi')
}

Return from $\texttt{solve_ols()}$ records all solutions and time spent at all iterations, thus they can be used to easily plot the relative errors for first $10000$ iterations against number of iterations and runtime.

Jacobi(parallel) is not recommended. It is very slow compared to Jacobi(sequential) because of several reasons. First, Jacobi(sequential) can write the update rule in matrix form to fasten computation. Second, the computation for each dimension in each iteration is small. Setting up parallelization environment itself can be longer. Only when communication among different clusters is conducted will Jacobi(parallel) be useful, which is not achieved here.

For illustrations, the relative erros for first $10000$ iterations against number of iterations are plotted here, using both Gauss-Seidel and Jacobi(sequential).

for (i in 2:3){
  # jacobi sequential
  error = sapply(est_jacobi_seq[[i]]$x.all, function(x){sqrt(sum((x-v)^2))/sqrt(sum(v^2))})
  l = min(length(error), 10000)
  plot(y = error[1:l], x = 1:l, ylab = "relative error", xlab = "iteration", 
       main = paste("Jacobi: alpha =", i))


  #  Gauss-Seidel
  error = sapply(est_GaussSeidel[[i]]$x.all, function(x){sqrt(sum((x-v)^2))/sqrt(sum(v^2))})
  l = min(length(error), 10000)
  plot(y = error[1:l], x = 1:l, ylab = "relative error", xlab = "iteration", 
       main = paste("Gauss-Seidel: alpha =", i))
}

Leveraging

The subsample size $r$ will influence the performance of leveraging methods for both uniform sampling and leverage score based sampling. When $r$ is small, leverage score based sampling usually outperform uniform sampling. Examples are given below for $r=10$ and $r=100$. The number of draws is $500$.

x = rt(n=500, df=6)
y = -x + rnorm(500)
lm.ols = lm(y ~ x)
r_seq = c(10,100)
est_leverage = list()
for (i in 1:2) est_leverage[[i]] =  algo_average(x,y,r=r_seq[i])

Box plots of $|\tilde{\beta}{unif}-\hat{\beta}|$ and $|\tilde{\beta}{unif}-\hat{\beta}|$ are plotted below, where $\hat{\beta}$ is the least square solution.

for (i in 1:2){
  dist = apply(est_leverage[[i]]$beta_unif, 1, function(x){abs(x[2]-lm.ols$coefficients[2])})
  boxplot(dist, main = paste("beta_unif",  ' r = ', r_seq[i]), ylab = 'difference')

  dist = apply(est_leverage[[i]]$beta_blev, 1, function(x){abs(x[2]-lm.ols$coefficients[2])})
  boxplot(dist, main = paste("beta_blev",  ' r = ', r_seq[i]), ylab = 'difference')
}

The scatterplots for a single draw which mark the subsamples and estimated line are also plotted below.

for (i in 1:2){
  plot(x,y, main = paste("unif sampling: r =", r_seq[i]))
  abline(lm(y ~ x))
  sample = sample(1:500, r_seq[i])
  points(x = x[sample], y = y[sample], col = 'green', pch = 4)
  abline(lm(y[sample] ~ x[sample]), col = 'green', lty = 2)

  plot(x,y, main = paste("weighted sampling: r =", r_seq[i]))
  abline(lm(y ~ x))
  hii = diag(matrix(x, nrow = 500) %*% matrix(x, nrow = 1)/sum(x^2))
  hii = hii/sum(hii)
  sample = sample(1:500, r_seq[i], prob = hii)
  w = 1/hii[sample]
  w = w/sum(w)
  points(x = x[sample], y = y[sample], col = 'red', pch = 4)
  abline(lm(y[sample] ~ x[sample], weights = w), col = 'red', lty = 2)
}

Coordinate Descent for Elastic Net

The solution path fitted by coordinate descent for elastic net are plotted before for $n=50$ and $\alpha=0,0.5,1$, respectively. The left side is my implementation, while the right side is the implementation from \texttt{glmnet}.

library(mvtnorm)
library(glmnet)
beta_true = rep(0, 20)
beta_true[1] = 2
beta_true[3] = -2
beta_true[5] = 1
beta_true[7] = -1
corr = diag(20)
corr[1,2] = 0.8
corr[5,6] = 0.8
corr = corr + t(corr) - diag(20)
N = 50
alpha_seq = c(0,0.5,1)

X_obs = rmvnorm(50, sigma = corr)
y_obs = X_obs %*% beta_true + rnorm(50)
for (j in 1:3){
    est = elnet_coord(X = X_obs, y = y_obs, alpha = alpha_seq[j])
    plotpath(fit=est, x = X_obs)
    est1 = glmnet(x = X_obs, y = y_obs, alpha = alpha_seq[j])
    plot(est1, xvar = "lambda")
}