knitr::opts_chunk$set(echo = TRUE)

Introduction

Whatever system you use, you need to install Julia. Go to https://julialang.org/downloads and choose the correct download, and install as appropriate. Note that autodiffr mainly works with Julia v0.6.x, and only partly works with Julia v0.7 currently (as Julia v0.7 is in beta). After having a correct Julia installation, autodiffr can be installed by:

devtools::install_github("Non-Contradiction/autodiffr")

It's also recommended to install the latest version of JuliaCall by:

devtools::install_github("Non-Contradiction/JuliaCall")

And after the installation is finished, just do the following steps, note that the first time it will take some time, but the next time it should be better.

library(autodiffr)
ad_setup()

Testing autodiffr

And we can use package numDeriv to compare with autodiffr.

require(numDeriv)

Direct wrapper functions of ForwardDiff and ReverseDiff

Basic usage

In this section, we are using a function similar to Rosenbrock function as an initial example:

## Later on, I may export these functions maybe in an R environment including all testing functions.
## So users can get easy access to these functions for examples, testing and benchmarking purposes.
## But currently these functions are unexported in autodiffr
rosen1 <- autodiffr:::rosenbrock_1
# And there are other functions, which can be used to replace the rosen1 function in the following examples like
# rosen2 <- autodiffr:::rosenbrock_2
# rosen4 <- autodiffr:::rosenbrock_4
# ackley <- autodiffr:::ackley

rosen1
# rosen2
# rosen4
# ackley

target <- rosen1

These functions receive a vector input and generates scalar output. So forward_grad and reverse_grad can be used to calculate their gradients and forward_hessian and reverse_hessian can be used to calculate their hessians. Besides, it should be possible to calculate the jacobian of the gradient function, which should be the same with the hessian.

The basic usage is as follows:

X <- runif(5)
X

forward_grad(target, X)

reverse_grad(target, X)

forward_jacobian(function(x) forward_grad(target, x), X)

reverse_jacobian(function(x) reverse_grad(target, x), X)

forward_hessian(target, X)

reverse_hessian(target, X)

And we can compare the results with numDeriv:

numDeriv::grad(target, X)

numDeriv::jacobian(function(x) numDeriv::grad(target, x), X)

numDeriv::hessian(target, X)

We can see the result is similar, and results from autodiffr should be more accurate.

Creating a function for the gradient

grosen <- autodiffr::makeGradFunc(target)
grosen
grosen(X)
## use in optimization
library(optimx)
x0<-c(-1.2, 1)
target(x0)
grosen(x0)
srosen0 <- optimr(x0, target, grosen, method="Rvmmin", control=list(trace=1))
srosen0

Advance usage

Use of config object

If gradient or hessian of the same target function need to be caculated repeatedly on inputs of the same shape, like input vectors of same length or input matrices with the same number of rows and columns, then users can caculate config objects before to improve the performance:

cfg <- forward_grad_config(target, X)
forward_grad(target, X, cfg = cfg)

cfg <- reverse_grad_config(X)
reverse_grad(target, X, cfg = cfg)

cfg <- forward_hessian_config(target, X)
forward_hessian(target, X, cfg = cfg)

cfg <- reverse_hessian_config(X)
reverse_hessian(target, X, cfg = cfg)

And users can also choose a chunk size when using config objects for forward mode automatic differentiation, choosing appropriate sample size can improve performance further.

cfg1 <- forward_grad_config(target, X, chunk_size = 1)
forward_grad(target, X, cfg = cfg1)

cfg1 <- forward_hessian_config(target, X, chunk_size = 1)
forward_hessian(target, X, cfg = cfg1)

Use of tape object in reverse mode automatic differentiation

In reverse mode automatic differentiation, if the target function are defined without any branching, like if, ifelse, while and etc, then it is possible to use tape to further improve performance:

tp <- reverse_grad_tape(target, X)
reverse_grad(tp, X)

tp <- reverse_hessian_tape(target, X)
reverse_hessian(tp, X)

User interface functions provided by autodiffr

autodiffr provides a set of user interface functions to deal with the usage of forward and reverse mode automatic differentiation.

Basic usage

If users want to caculate the gradient, hessian or jacobian directly:

autodiffr::ad_grad(target, X, mode = "forward")

autodiffr::ad_grad(target, X, mode = "reverse")

autodiffr::ad_hessian(target, X, mode = "forward")

autodiffr::ad_hessian(target, X, mode = "reverse")

It's easy to have a gradient, hessian, or jacobian function:

autodiffr::makeGradFunc(target, mode = "forward")(X)

autodiffr::makeGradFunc(target, mode = "reverse")(X)

autodiffr::makeHessianFunc(target, mode = "forward")(X)

autodiffr::makeHessianFunc(target, mode = "reverse")(X)

Advanced usage

If users want to have a gradient, hessian, or jacobian function specalised for inputs of the fixed shape (like length of the input vector or numbers of rows and columns of the input matrix), then it is possible to have an optimized function by providing x argument:

g1 <- autodiffr::makeGradFunc(target, mode = "forward", x = runif(length(X)))
g1(X)

g2 <- autodiffr::makeGradFunc(target, mode = "reverse", x = runif(length(X)))
g2(X)

h1 <- autodiffr::makeHessianFunc(target, mode = "forward", x = runif(length(X)))
h1(X)

h2 <- autodiffr::makeHessianFunc(target, mode = "reverse", x = runif(length(X)))
h2(X)

Beyond the x argument, users can also provide chunk_size for forward mode automatic differentiation. From ?ForwardDiff users can see instrustions on how to choose chunk size. And users can use use_tape for reverse mode automatic differentiation, which may increase the performance of gradient, hessian or jacobian functions a lot.

g1 <- autodiffr::makeGradFunc(target, mode = "forward", x = runif(length(X)), chunk_size = 1)
g1(X)

g2 <- autodiffr::makeGradFunc(target, mode = "reverse", x = runif(length(X)), use_tape = TRUE)
g2(X)

h1 <- autodiffr::makeHessianFunc(target, mode = "forward", x = runif(length(X)), chunk_size = 1)
h1(X)

h2 <- autodiffr::makeHessianFunc(target, mode = "reverse", x = runif(length(X)), use_tape = TRUE)
h2(X)

But please note that if the target function are defined with branching, like if, ifelse, while and etc, then use_tape may give incorrect results, like the following example.

f <- function(x){
    if (x[1] > 1.5) {
        sum(x^2)
    }
    else {
        sum(x^3)
    }
}

gF <- autodiffr::makeGradFunc(f, mode = "forward", x = rep(2,3), chunk_size = 1)

gR <- autodiffr::makeGradFunc(f, mode = "reverse", x = rep(2,3), use_tape = TRUE)

gF(rep(1,3)) ## correct result

gR(rep(1,3)) ## wrong result


Non-Contradiction/autodiffr documentation built on May 10, 2019, 8:04 a.m.