knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)

1. Introduction

The goal of myOpt package is to provide users an efficient way to estimate parameters of a linear model through optimization techniques.

2. Gradient descent method

In the following lines, the results and the computation times are compared, using the function of this package for gradient descent method ("linear_gd_optim") and the standard one ("lm").

library(myOpt)

## basic example code

set.seed(8675309)

# data simulation for example purposes

n = 1000

x1 = rnorm(n)
x2 = rnorm(n)
X <- cbind(rep(1,n),x1,x2) # predictor matrix

Y = 5.6 + 2.3*x1 + 8.7*x2 + rnorm(n) # response vector

# random initial values for the parameters of the linear model

par <- rnorm(dim(X)[2])

# the function returns a vector containing the values of the estimated parameters

opt_par <- linear_gd_optim(par, X, Y)

# standard lm function for estimating the parameters

lm_par <- lm(Y ~ x1 + x2)$coefficients

opt_par

lm_par

As it is possible to notice, the results are quite the same, meaning that the accuracy of the method implemented is, in terms of results, comparable to the standard one. Now let's benchmark the vectorized version of the the function "linear_gd_optim" versus the one using only for loops.

library(ggplot2)
library(dplyr)
library(tidyr)
library(bench)

## Useful functions

# Plot a benchmark
show_bm <- function(bm) {
  print(print_bench(bm))
  autoplot(bm)
}

# printable bench (for RMarkdown)
print_bench <- function(bm) {
  bm %>% 
    mutate(expression = as.character(expression))
}


## Benchmarks 

bench::mark(
  vec_method = linear_gd_optim(par, X, Y),
  for_method = linear_gd_optim_old(par, X, Y),
  filter_gc = FALSE,
) %>%
  show_bm()

From the benchmarks it is possible to see how the vectorized version of the function performs much faster than the other one; this allows our function to compete with the standard one "lm" not only in terms of result accuracy but also in terms of computation times.

3. Steepest descent method

The parameters of the linear model can also be calculated with the steepest descend method, using the function "linear_sd_optim" implemented in this package.

The example of the previous section is repeated applying the steepest descend method, comparing results and computation times with the gradient descent method and the standard function "lm".

library(myOpt)

## basic example code

set.seed(8675309)

# data simulation for example purposes

n = 1000

x1 = rnorm(n)
x2 = rnorm(n)
X <- cbind(rep(1,n),x1,x2) # predictor matrix

Y = 5.6 + 2.3*x1 + 8.7*x2 + rnorm(n) # response vector

# random initial values for the parameters of the linear model

par <- rnorm(dim(X)[2])

# the function returns a vector containing the values of the estimated parameters

opt_par <- linear_sd_optim(par, X, Y)

# standard lm function for estimating the parameters

lm_par <- lm(Y ~ x1 + x2)$coefficients

opt_par

lm_par

It can be noted that the results are quite similar, meaning that also the accuracy of the implemented steepest descent is, in terms of results, comparable to the standard function "lm". We proceed benchmarking the functions "linear_gd_optim" (based on the gradient descent) and "linear_sd_optim" (based on the steepest descent).

library(ggplot2)
library(dplyr)
library(tidyr)
library(bench)

## Useful functions

# Plot a benchmark
show_bm <- function(bm) {
  print(print_bench(bm)) 
  autoplot(bm)
}

# printable bench (for RMarkdown)
print_bench <- function(bm) {
  bm %>% 
    mutate(expression = as.character(expression))
}


## Benchmarks 

bench::mark(
  gd_method = round(linear_gd_optim(par, X, Y),1),
  sd_method = round(linear_sd_optim(par, X, Y),1),
  filter_gc = FALSE,
) %>%
  show_bm()

From the benchmarks it is again possible to observe that the gradient descent method is faster than the steepest descent one, in front of a comparable accuracy. This is essentially due to the additional operations that the steepest descent has to perform at each iteration of the optimization procedure to calculate the step, compared to the gradient method.

4. Gradient and steepest descent methods using more predictors

This section shows that the gradient and steepest descent methods, implemented in this package, work using even more than two predictors.

A new example, including three predictors, is reported in this section section applying the gradient and the steepest descend methods, comparing results and computation times with the standard function "lm".

library(myOpt)

## basic example code

set.seed(8675309)

# data simulation for example purposes

n = 1000

x1 = rnorm(n)
x2 = rnorm(n)
x3 = rnorm(n)
X <- cbind(rep(1,n),x1,x2,x3) # predictor matrix

Y = 5.6 + 2.3*x1 + 8.7*x2 + 7.2*x3 + rnorm(n) # response vector

# random initial values for the parameters of the linear model

par <- rnorm(dim(X)[2])

# the function returns a vector containing the values of the estimated parameters

opt_par_gd <- linear_gd_optim(par, X, Y)
opt_par_sd <- linear_sd_optim(par, X, Y)

# standard lm function for estimating the parameters

lm_par <- lm(Y ~ x1 + x2 + x3)$coefficients

opt_par_gd

opt_par_sd

lm_par

It can be noted that the results are quite similar, meaning that also the accuracy of the implemented steepest gradient and steepest descent methods are, in terms of results, comparable to the standard function "lm", also using three predictors. We proceed benchmarking the functions "linear_gd_optim" (based on the gradient descent) and "linear_sd_optim" (based on the steepest descent) using three predictors.

library(ggplot2)
library(dplyr)
library(tidyr)

## Useful functions

# Plot a benchmark
show_bm <- function(bm) {
  print(print_bench(bm)) 
  autoplot(bm)
}

# printable bench (for RMarkdown)
print_bench <- function(bm) {
  bm %>% 
    mutate(expression = as.character(expression))
}


## Benchmarks 

bench::mark(
  gd_method = round(linear_gd_optim(par, X, Y),1),
  sd_method = round(linear_sd_optim(par, X, Y),1),
  filter_gc = FALSE,
) %>%
  show_bm()

The benchmarks confirm that the gradient descent method is faster than the steepest descent one, in front of a comparable accuracy, also using three predictors.

5. Prediction

After the estimation of the parameters, for both method it can be applied a prediction function, that predicts the outcome given the estimated parameters vector of the model and a predictors data matrix.

# prediction with the parameters estimated through the lm function
prediction_lm <- predict(lm(Y ~ x1 + x2 + x3))

# prediction with the parameters estimated through the gradient descend method
prediction_gd <- my_linear_predict(opt_par_gd, X)

# prediction with the parameters estimated through the steepest descend method
prediction_sd <- my_linear_predict(opt_par_sd, X)

After the computation of the predictions this package allows also to compute the error associated with the prediction.

# prediction error associated to the lm function estimated parameters
my_pred_error(Y, prediction_lm)

# prediction error associated to the gradient descend estimated parameters
my_pred_error(Y, prediction_gd)

# prediction error associated to the steepest descend estimated parameters
my_pred_error(Y, prediction_sd)

It is possible to see how the three methods give similar results in terms of prediction and also in terms of prediction error.

6. Cross validation

In order to estimate how accurately the predictive model will perform in practice, cross validation techniques can be used. In particular, the k-fold cross validation is implemented in the myOpt package to assess the predictions of the gradient and steepest descent methods. The function "my_k_fold_cv" can be used on this purpose for both methods, depending on the value of the argument "method" (to be set "gd" for gradient descent, and to "sd" for steepest descent).

library(doSNOW)

K <- 5
set.seed(8675309)
my_k_fold_cv(par, X, Y, K)
set.seed(8675309)
my_k_fold_cv(par, X, Y, K, method = "sd")

We can observe that the two methods have very similar prediction errors, which are, as expected, a bit higher than the in-sample errors calculated in the previous section.

The function "my_k_fold_cv" can also perform a parallel calculation. It is sufficient to set the argument "parallel" equal to TRUE to move from the sequential to the parallel computing, distributing the calculation on all the available cores. We can analyze the effect of the parallel computing on the calculation times.

## Useful functions

# Print a benchmark
show_bm <- function(bm) {
  print(print_bench(bm)) 
}

# printable bench (for RMarkdown)
print_bench <- function(bm) {
  bm %>% 
    mutate(expression = as.character(expression))
}

set.seed(8675309)
bench::mark(
  sequential_gd = round(my_k_fold_cv(par, X, Y, K),1),
  parllel_gd = round(my_k_fold_cv(par, X, Y, K, parallel = TRUE),1),
  filter_gc = FALSE,
) %>%
  show_bm()

set.seed(8675309)
bench::mark(
  sequential_sd = round(my_k_fold_cv(par, X, Y, K, method = "sd"),1),
  parallel_sd = round(my_k_fold_cv(par, X, Y, K, method = "sd", parallel = TRUE),1),
  filter_gc = FALSE,
) %>%
  show_bm()

We observe that, considering the current example, the parallel computing is convenient for steepest descent method, whereas it increases the computation time for the gradient descent one. For this last case, since also distributing the calculation has a time impact, it can happen that it more than compensate the time saving obtained splitting the procedure on more cores.

In order to fully appreciate the benefit of the parallel computing, we can set a more computational intensive example, increasing the sample size from 1000 to 10000.

set.seed(8675309)
n <- 10000
x1 <- rnorm(n)
x2 <- rnorm(n)
Y <- 1 + 0.5*x1 + 0.2*x2 + rnorm(n)
X <- cbind(rep(1,n),x1,x2)
par <- rnorm(dim(X)[2])
K <- 5

set.seed(8675309)
bench::mark(
  sequential_gd = round(my_k_fold_cv(par, X, Y, K),1),
  parallel_gd = round(my_k_fold_cv(par, X, Y, K, parallel = TRUE),1),
  filter_gc = FALSE,
) %>%
  show_bm()

We note that, increasing the sample size, the parallel computing allows to save time also using the gradient descent method.

7. Conclusions

This document shows how to apply the functions to estimate the linear regression parameters, implemented in this package, based on the gradient descent and steepest descent methods.

Several examples are reported, including comparisons and benchmarks between the two implemented methods, and versus the standard function "lm". All the provided results agree on the fact that the implemented methods are comparable, in terms of accuracy, with respect to the standard function.

In terms of computation time, gradient descent method is faster than the steepest descent one, because of the simpler step calculation at each iteration. We can observe that, even if it is a bit slower, the gradient descent method reaches computation times which are comparable to standard function.

Some examples of k-fold cross validation are reported, showing how to optimize the calculations using the parallel computing.



lucaaiello/myOpt documentation built on Dec. 21, 2021, 11:51 a.m.