In lucaaiello/myOpt: My Optimization Package

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)

1. Introduction

The goal of myOpt package is to provide users an efficient way to estimate parameters of a linear model through optimization techniques.

2. Gradient descent method

In the following lines, the results and the computation times are compared, using the function of this package for gradient descent method ("linear_gd_optim") and the standard one ("lm" from the "stats" core package).

library(myOpt)

## basic example code

set.seed(8675309)

# data simulation for example purposes

n = 1000

x1 = rnorm(n)
x2 = rnorm(n)
X <- cbind(rep(1,n),x1,x2) # predictor matrix

Y = 5.6 + 2.3*x1 + 8.7*x2 + rnorm(n) # response vector

# random initial values for the parameters of the linear model

par <- rnorm(dim(X)[2])

# the function returns a vector containing the values of the estimated parameters

opt_par <- linear_gd_optim(par, X, Y)

# standard lm function for estimating the parameters

lm_par <- lm(Y ~ x1 + x2)$coefficients

opt_par

lm_par

As it is possible to notice, the results are quite the same, meaning that the accuracy of the method implemented is, in terms of results, comparable to the standard one.

Now let's benchmark the above used vectorized version of the gradient descent method (function "linear_gd_optim") versus the one using for loops (function "linear_gd_optim_old").

library(ggplot2)
library(dplyr)
library(tidyr)
library(bench)

## Useful functions

# Plot a benchmark
show_bm <- function(bm) {
  print(print_bench(bm))
  autoplot(bm)
}

# printable bench (for RMarkdown)
print_bench <- function(bm) {
  bm %>% 
    mutate(expression = as.character(expression))
}


## Benchmarks 

bench::mark(
  lm_method = round(lm(Y ~ x1 + x2)$coefficients,1),
  vec_method = round(linear_gd_optim(par, X, Y),1),
  for_method = round(linear_gd_optim_old(par, X, Y),1),
  filter_gc = FALSE,
  min_time = 10
) %>%
  show_bm()

From the benchmarks, it is possible to see how the vectorized version of the function performs much faster than the loops-based one; this allows our function to compete with the standard one "lm" not only in terms of result accuracy but also in terms of computation times. In fact, "lm" is still the fastest one (being based on C code internal functions), but "linear_gd_optim" still remains (for this example) within the fractions of a second. On purposes of comparison, these calculations are executed using a processor Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz (4 logical processors) with RAM 24.0 GB, Windows 7 64-bit Operating System, and R version 4.0.2.

3. Steepest descent method

The parameters of the linear model can also be calculated with the steepest descend method, using the function "linear_sd_optim" implemented in this package.

The example of the previous section is repeated applying the steepest descend method, comparing results with the gradient descent method and the standard function "lm".

library(myOpt)

## basic example code

set.seed(8675309)

# data simulation for example purposes

n = 1000

x1 = rnorm(n)
x2 = rnorm(n)
X <- cbind(rep(1,n),x1,x2) # predictor matrix

Y = 5.6 + 2.3*x1 + 8.7*x2 + rnorm(n) # response vector

# random initial values for the parameters of the linear model

par <- rnorm(dim(X)[2])

# the function returns a vector containing the values of the estimated parameters

opt_par <- linear_sd_optim(par, X, Y)

# standard lm function for estimating the parameters

lm_par <- lm(Y ~ x1 + x2)$coefficients

opt_par

lm_par

It can be noted that the results are quite similar, meaning that also the accuracy of the implemented steepest descent is, in terms of results, comparable to the standard function "lm".

We proceed benchmarking the functions "linear_gd_optim" (based on the gradient descent) and "linear_sd_optim" (based on the steepest descent) with "lm" (the standard one).

library(ggplot2)
library(dplyr)
library(tidyr)
library(bench)

## Useful functions

# Plot a benchmark
show_bm <- function(bm) {
  print(print_bench(bm)) 
  autoplot(bm)
}

# printable bench (for RMarkdown)
print_bench <- function(bm) {
  bm %>% 
    mutate(expression = as.character(expression))
}


## Benchmarks 

bench::mark(
  lm_method = round(lm(Y ~ x1 + x2)$coefficients,1),
  gd_method = round(linear_gd_optim(par, X, Y),1),
  sd_method = round(linear_sd_optim(par, X, Y),1),
  filter_gc = FALSE,
  min_time = 10
) %>%
  show_bm()

From the benchmarks, it is possible to observe that the gradient descent method is faster than the steepest descent one, in front of a comparable accuracy. This is essentially due to the additional operations that the steepest descent method has to perform at each iteration of the optimization procedure to calculate the step, compared to the gradient descent one.

4. Gradient and steepest descent methods using more predictors

This section shows that the steepest gradient and the steepest descent methods, implemented in this package, work using even more than two predictors.

A new example, including three predictors, is reported in this section section applying the gradient descend and the steepest descend methods, comparing results with the standard function "lm".

library(myOpt)

## basic example code

set.seed(8675309)

# data simulation for example purposes

n = 1000

x1 = rnorm(n)
x2 = rnorm(n)
x3 = rnorm(n)
X <- cbind(rep(1,n),x1,x2,x3) # predictor matrix

Y = 5.6 + 2.3*x1 + 8.7*x2 + 7.2*x3 + rnorm(n) # response vector

# random initial values for the parameters of the linear model

par <- rnorm(dim(X)[2])

# the function returns a vector containing the values of the estimated parameters

opt_par_gd <- linear_gd_optim(par, X, Y)
opt_par_sd <- linear_sd_optim(par, X, Y)

# standard lm function for estimating the parameters

lm_par <- lm(Y ~ x1 + x2 + x3)$coefficients

opt_par_gd

opt_par_sd

lm_par

It can be noted that the results are quite similar, meaning that the accuracy of the implemented steepest gradient and steepest descent methods is, in terms of results, comparable to the standard function "lm", also using three predictors.

We proceed benchmarking the functions "linear_gd_optim" (based on the gradient descent) and "linear_sd_optim" (based on the steepest descent) with "lm" (the standard one) using three predictors.

library(ggplot2)
library(dplyr)
library(tidyr)

## Useful functions

# Plot a benchmark
show_bm <- function(bm) {
  print(print_bench(bm)) 
  autoplot(bm)
}

# printable bench (for RMarkdown)
print_bench <- function(bm) {
  bm %>% 
    mutate(expression = as.character(expression))
}


## Benchmarks 

bench::mark(
  lm_method = round(lm(Y ~ x1 + x2 + x3)$coefficients,1),
  gd_method = round(linear_gd_optim(par, X, Y),1),
  sd_method = round(linear_sd_optim(par, X, Y),1),
  filter_gc = FALSE,
  min_time = 10
) %>%
  show_bm()

The benchmarks confirm that the gradient descent method is faster than the steepest descent one, in front of a comparable accuracy, also using three predictors, and that "lm" remains the fastest one.

5. Prediction

After the estimation of the parameters, for both method it can be applied a prediction function, that predicts the outcome given the estimated parameters vector of the model and a predictors data matrix.

# prediction with the parameters estimated through the lm function
prediction_lm <- predict(lm(Y ~ x1 + x2 + x3))

# prediction with the parameters estimated through the gradient descend method
prediction_gd <- my_linear_predict(opt_par_gd, X)

# prediction with the parameters estimated through the steepest descend method
prediction_sd <- my_linear_predict(opt_par_sd, X)

After the computation of the predictions this package allows also to compute the error associated with the prediction.

# prediction error associated to the lm function estimated parameters
my_pred_error(Y, prediction_lm)

# prediction error associated to the gradient descend estimated parameters
my_pred_error(Y, prediction_gd)

# prediction error associated to the steepest descend estimated parameters
my_pred_error(Y, prediction_sd)

It is possible to see how the three methods give similar results in terms of prediction and also in terms of prediction error.

6. Cross validation

In order to estimate how accurately the predictive model will perform in practice, cross validation techniques can be used. In particular, the k-fold cross validation is implemented in the myOpt package to assess the predictions of the gradient and steepest descent methods. The function "my_k_fold_cv" can be used on this purpose for both methods, depending on the value of the argument "method" (to be set to "gd" for gradient descent, and to "sd" for steepest descent).

library(doSNOW)

K <- 5
set.seed(8675309)
my_k_fold_cv(par, X, Y, K)
set.seed(8675309)
my_k_fold_cv(par, X, Y, K, method = "sd")

We can observe that the two methods have very similar prediction errors, which are, as expected, a bit higher than the in-sample errors calculated in the previous section.

The function "my_k_fold_cv" can also perform a parallel calculation. It is sufficient to set the argument "parallel" equal to TRUE to move from the sequential to the parallel computing, distributing the calculation on all the available cores (which are 4 in this example). The impact of the parallel computing on the calculation times is analyzed below.

## Useful functions

# Plot a benchmark
show_bm <- function(bm) {
  print(print_bench(bm))
  autoplot(bm)
}

# printable bench (for RMarkdown)
print_bench <- function(bm) {
  bm %>% 
    mutate(expression = as.character(expression))
}

set.seed(8675309)
bench::mark(
  sequential_gd = round(my_k_fold_cv(par, X, Y, K),1),
  parllel_gd = round(my_k_fold_cv(par, X, Y, K, parallel = TRUE),1),
  filter_gc = FALSE,
  min_time = 10
) %>%
  show_bm()

set.seed(8675309)
bench::mark(
  sequential_sd = round(my_k_fold_cv(par, X, Y, K, method = "sd"),1),
  parallel_sd = round(my_k_fold_cv(par, X, Y, K, method = "sd", parallel = TRUE),1),
  filter_gc = FALSE,
  min_time = 10
) %>%
  show_bm()

It can be observed that, considering the current example, the parallel computing is convenient for steepest descent method, whereas it increases the computation time for the gradient descent one. For this last case, since also distributing the calculation has a time impact, it can happen that it more than compensates the time saving obtained splitting the procedure on more cores.

In order to fully appreciate the benefit of the parallel computing, we can set a more computational intensive example, increasing the sample size from 1000 to 10000, and the number of folds from 5 to 10.

set.seed(8675309)
n <- 10000
x1 <- rnorm(n)
x2 <- rnorm(n)
Y <- 1 + 0.5*x1 + 0.2*x2 + rnorm(n)
X <- cbind(rep(1,n),x1,x2)
par <- rnorm(dim(X)[2])
K <- 10

set.seed(8675309)
bench::mark(
  sequential_gd = round(my_k_fold_cv(par, X, Y, K),1),
  parallel_gd = round(my_k_fold_cv(par, X, Y, K, parallel = TRUE),1),
  filter_gc = FALSE,
  min_time = 100
) %>%
  show_bm()

set.seed(8675309)
bench::mark(
  sequential_gd = round(my_k_fold_cv(par, X, Y, K, method = "sd"),1),
  parallel_gd = round(my_k_fold_cv(par, X, Y, K, method = "sd", parallel = TRUE),1),
  filter_gc = FALSE,
  min_time = 100
) %>%
  show_bm()

Considering this more computational intensive example, it can be noted that the parallel computing allows to save time also using the gradient descent method.

7. Conclusions

This document shows how to apply the functions to estimate the linear regression parameters, implemented in this package, based on the gradient descent and the steepest descent methods.

Several examples are reported, including comparisons and benchmarks between the two implemented methods, and versus the standard function "lm". All the provided results agree on the fact that the implemented methods are comparable, in terms of accuracy, with respect to the standard function.

In terms of computation time, gradient descent method is faster than the steepest descent one, because of the simpler step calculation at each iteration. It can observed that, even if it is still slower, the gradient descent method reaches computation times which are comparable to the standard function, keeping (at least, on considered examples) an order of fractions of seconds.

The prediction of the gradient descent and the steepest descent methods is evaluated, showing that their in-sample prediction error is comparable to the one of the standard function.

Some examples of k-fold cross validation are reported, showing how to optimize the calculations using the parallel computing. These examples show that the implemented gradient descent and steepest descent methods have also a comparable out-of-sample prediction error.

lucaaiello/myOpt documentation built on Dec. 21, 2021, 11:51 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com