knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

With the gradient package is possible to fit linear models using steepest descent or gradient discent with the same flexibility of the \code{lm} default function in R. First of all we load the library.

library(gradient)

In this vignette we will use a generated dataset but it is possible to use real data too. Moreover we initialize some parameter that will be used in the following functions.

n = 1000
x1 = rnorm(n)
x2 = rnorm(n)

y = 1 + .5*x1 + .2*x2 + rnorm(n)
x <- cbind(x1, x2)
stepsize <- 1e-5
tolerance <- 1e-10
maxit <- 1000
b <- c(0.1, 0.1, 0.1)
data <- data.frame(y=y, x1=x1, x2=x2)

Steepest descent

The first linear model will be fitted using the steepest descent.

sd <- lm_gradient(b=b, formula=y~x1+x2, data=data, maxit, tolerance, fun="sd")
sd

In this particular case, the warning is useful to check that the convergence is reached before the maxiumum number of iteration.

Gradient descent

The gradient descent can be performed easly changing the \code{fun} parameter.

gd <- lm_gradient(b=b, formula=y~x1+x2, data=data, maxit, tolerance, fun="gd")
gd

In both of the algorithm we obtain the same results.

Inspect the fit

The gradient package has different function to inspect the fit of the model. Similar to what is available for the R base function lm.

print(sd)
coef(sd)
summary(sd)

Moreover it is possible to easly plot the convergence of the algorithm.

plot(sd)
plot(gd)

Both of the algorithm reach the convergence but the steepest descent is faster (it takes only 30 iterations!). The matrix A inside the object generated by the function lm_gradient contains all the steps of the algorithm for further investigation.

head(sd$A)
head(gd$A)

Cross validation

The gradient package has built-in two different methods for the cross validation: k-fold and leave-one-out. As previously descripted, both methods are available for steepest descent or gradient descent.

K-fold

gd_cv <- lm_gradient_cv(5, b=b, formula=y~x1+x2, data=data, maxit, tolerance, fun="gd", parallel = FALSE)

In this example we performed the k-fold cross validation with k=5. It is possible to validate the results compairing the RMSE, MAE and MedianAE error.

print(gd_cv)

Leave-one-out

Similarly it is possbile to perform the leave-one-out cross validation with the following function:

gd_looc <- lm_gradient_looc(b=b, formula=y~x1+x2, data=data, maxit, tolerance, fun="gd", parallel = FALSE)

Parallelization

Both of the cross validation are available in parallel taking advantage of multi-core for faster computation setting the parameter parallel to TRUE.



vincnardelli/gradient documentation built on July 30, 2020, 9:57 a.m.