knitr::opts_chunk$set(echo = TRUE) library(doSNOW) library(dplyr) library(purrr) library(readr) library(readxl) library(ggplot2) library(tidyr)
gradescCM
is a toy R package developed by Federico Cortese and Alessandro Mascaro and available on GitHub. It includes functions to perform linear regression and basic loocv implementations to evaluate the linear fit.
To install and load the package, the following code is sufficient
# install.packages('devtools') library(devtools) install_github('FedericoCortese/R4DScm') library(gradescCM)
This is an example of usage of the functions linear_gd_optim
and linear_gd_optim2
. The first function contains the analitycal expression for the gradient of the loss function, whereas the second makes use of the function grad
from the package numDeriv
.
We expect the first function to be faster than the second one.
Consider the following simulated dataset:
library(gradescCM) set.seed(8675309) n = 1000 x1 = rnorm(n) x2 = rnorm(n) y = 1 + .5*x1 + .2*x2 + rnorm(n) X=cbind(x1,x2)
We set the initial values for the parameters' vector beta equal to
b_pre=c(0,0,0) tol=1e-6 maxit=10^5
For the first function we have
linear_gd_optim(b_pre,X,y,tol=tol,maxit=maxit)
For the second function
linear_gd_optim2(b_pre,X,y,tol=tol,maxit=maxit)
Time consuming:
library(dplyr) library(purrr) library(readr) library(readxl) library(ggplot2) library(tidyr)
show_bm=function(bm){ print(bm) autoplot(bm) } bench::mark( check=F, first=linear_gd_optim(b_pre,X,y), second=linear_gd_optim2(b_pre,X,y) )%>% show_bm()
The first version is clearly faster than the second one. Consider now the Steepest descent method
basic_sd(b_pre,X,y,tol=tol,maxit=maxit)
Let us compare the Gradient Descent and the Steepest Descent:
bench::mark( check=F, GD=linear_gd_optim(b_pre,X,y), SD=basic_sd(b_pre,X,y) )%>% show_bm()
As it can be observed, the steepest descent algorithm is much faster than the gradient descent. The main reason is that the choice of the stepsize of the steepest descent algorithm (which depends on the Hessian) makes it perform less iterations.
This are two examples of usage of the functions cm_LOOCV
and cm_pllLOOCV
. Both functions allow a simple computation of the mean squared error of prediction arising from a linear regression (whose parameters are computed through a Steepest descent method) through LOOCV, a well known resampling method. The first function perform a sequential LOOCV, while the second one allows for some parallelization that may speed up computation.
As a first example, let's consider the usual dataset:
library(gradescCM) set.seed(8675309) n = 1000 x1 = rnorm(n) x2 = rnorm(n) y = 1 + .5*x1 + .2*x2 + rnorm(n) X=cbind(x1,x2)
We set the initial values for the parameters' vector beta, together with the tolerance and maximum number of iterations:
b_pre=c(0,0,0) tol=1e-3 maxit=10^3
For the first function we have
cm_LOOCV(y = y, X = X, b_pre = b_pre, tol = tol, maxit = maxit)
For the second function
cm_pllLOOCV(y = y, X = X, b_pre = b_pre, tol = tol, maxit = maxit)
We can compare their efficiency as usual
bench::mark( check=F, sequential=cm_LOOCV(y, X, b_pre = b_pre, tol = tol, maxit = maxit), parallel=cm_pllLOOCV(y, X, b_pre = b_pre, tol = tol, maxit = maxit) )%>% show_bm()
As a second example let's consider the following slightly modified dataset, in which the number of observations is scaled up to $10^4$
set.seed(8675309) n = 2*10^4 x1 = rnorm(n) x2 = rnorm(n) y = 1 + .5*x1 + .2*x2 + rnorm(n) X=cbind(x1,x2) options(warn=-1)
Again, we set the initial values for the parameters' vector beta, together with the tolerance and maximum number of iterations:
b_pre=c(0,0,0) tol=1e-3 maxit=10^3
We have
cm_LOOCV(y = y, X = X, b_pre = b_pre, tol = tol, maxit = maxit) cm_pllLOOCV(y = y, X = X, b_pre = b_pre, tol = tol, maxit = maxit)
And again we compare the efficiency of the two algorithms
bench::mark( check=F, sequential=cm_LOOCV(y, X, b_pre = b_pre, tol = tol, maxit = maxit), parallel=cm_pllLOOCV(y, X, b_pre = b_pre, tol = tol, maxit = maxit) ) %>% show_bm()
As it can be observed, in the first example, the sequential version of LOOCV is more efficient than the parallelized one and viceversa in the second one. Hence, parallelizing the algorithm is not always computationally convenient, as for small dataset the cost of setting up the parallelization may be higher than the gain in efficiency. Clearly the more computationally intensive the problem at hand is, the more the advantages of using parallelization.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.