knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This vignette introduces the basic functionality of the RAutoDiff package. The package provides first- and second order forward mode automatic differentiation of simple algorithms written in R. The package is not very fast, and is therefore mainly intended for prototyping algorithms that later will be implemented in a compiled language. However, it should also be applicable for smaller problems.

Notice: so far, only a small subset of all the available functions in R have been overloaded to work with the the package. An overview of the available functions will be maintained in a separate vignette.

The very basics

The typical application of the package consist of the following four steps:

library(RAutoDiff)

Below is a very simple example where we differentiate the function $f(x) = x^2/2$ which of course has derivatives $f'(x) = x$ and $f''(x) = 1$, and evaluate these for $x=3$ ($f(3) = 9/2$, $f'(3)=3$, $f''(3)=1$):

x <- 3.0 # value where we wish to evaluate function
x.a <- independent2(x) # set independent value (allowing for first and second derivatives)
f.a <- 0.5*x.a^2 # evaluate function (note using output of independent*() as input)
print(paste0("value of function: ",value(f.a))) # extract value
print(paste0("gradient of function: ",gradient(f.a))) # extract gradient
print(paste0("hessian of function: ",hessian(f.a)))

I.e. indeed as predicted above...

Multivariate example

Now consider the derivative of the quadratic form $f(x) = \frac{1}{2} x^T P x$ where $P$ is some symmetric positive definite matrix. Again, this problem allows simple expressions for first and second order derivatives, namely $\nabla f(x) = P x$ and $\nabla^2 f(x) = P$.

Let's replicate using AD:

P <- matrix(c(1,2,2,5),nrow=2) # 2x2 matrix
x <- c(1,2) # where to evaluate
x.a <- independent2(x)
f.a <- 0.5*x.a%*%P%*%x.a
print(paste0("value : ",value(f.a), " should be ",0.5*x%*%P%*%x))
print("gradient:")
print(gradient(f.a))
print("should be:")
print(P%*%x)
print("hessian:")
print(hessian(f.a))
print("should be:")
print(P)

Notice that the basic AD types are vectors.

The package can also be used for functions/algorithms with multivariate output. To extract the Jacobian, use the function jacobian(). To obtain a Hessian of each output component, apply the hessian() function to that component:

x.a <- independent2(c(-1,0,1))
vec <- c(0.5*sum(x.a^2),sum(x.a),prod(x.a))
print(jacobian(vec)) 
print(hessian(vec[1])) # notice hessian of first element in vec!

Functions

The AD types can also be used within functions, e.g.

fun <- function(x){ # define some function
  ret <- 0.5*sum(x^2)
  return(ret)
}
x <- c(-1,0,1,2) # where to evaluate
f <- fun(x) # evaluate with regular (numeric) argument
print(f)
f.d <- fun(independent(x)) # first order AD
print(f.d)
f.dd <- fun(independent2(x)) # first and second order AD
print(f.dd)

Matrices and linear algebra

So far, the package has some basic matrix/linear algebra capabilities where matrices are created using overloaded versions of matrix() or diag():

vals <- independent(c(1,-1,2,2)) # indpendent variables
m <- matrix(vals,2,2) # creates a AD_matrix
print(m)
f <- solve(m,c(3,-4)) # overload of solve function
print(f) # m^{-1} %*% c(3,-4)
print(m%*%f) # should be constant c(3,-4)
print((m%*%f)@vals) # display the AD vector used for storing values of the matrix

The solve function is based on a basic implementation of the Crout LU decomposition (without pivoting), and should only be used for small, well-scaled problems

Other functions for AD matrices so far implemented: length(), nrow(), ncol(), diag(), rowSums(), colSums(), sum(), t(), backsolve(), forwardsolve(), det(), determinant().

Cholesky factorization

The package also implements a Cholesky factorization for symmetric positive definite matrices which is much more stable than the LU-decomposition. The Cholesky factorization comes with a set of solve routines named solve.chol() :

A <- diag(independent(c(3,4,5))) + matrix(1,3,3) #SPD matrix  
print(A)
U <- chol(A)
print(U)
print(chol(value(A))) # same computation in numeric types
print(A%*%solve.chol(A,diag(A))) # should return diag A

The Cholesky routines are all based on the basic lower triangular Cholesky factorization available (including an overload with numeric types) as cholL(). Use this one for best performance; e.g. log-determinant of matrix A above:

log.det <- 2.0*sum(log(diag(cholL(A))))
print(log.det) # log-determinant and its Jacobian
print(determinant(value(A),logarithm = TRUE)[1]) # same in numeric types

Simple worked example

One of the reasons we are interested in derivatives is of course for application in numerical optimization. Below is an example of optimization of the negative log-likelihood for a simple $N(\mu,\exp(2\lambda))$, $\theta=(\mu,\lambda)$ model with iid observations:

set.seed(1)
y <- rnorm(10) # simulate some iid N(0,1) data
n.log.like <- function(theta){ #theta = c(mu,lambda)
  mu <- theta[1]
  sigma <- exp(theta[2])
  return(-sum(dnorm(y,mu,sigma,log=TRUE))) # negative log likelihood
}
gr <- function(theta){ # gradient aka negative score
  return(gradient(n.log.like(independent(theta))))
}
he <- function(theta){ # negative hessian
  return(hessian(n.log.like(independent2(theta))))
}
theta0 <- c(0,0) # intial point for optimization
opt.out <- optim(par=theta0,fn=n.log.like,gr=gr,method = "BFGS")
print(opt.out) #i.e. success!
print(gr(opt.out$par)) # gradient at (numerical) maximizer (essentially zero)
true.mle <- c(mean(y),0.5*log(mean((y-mean(y))^2))) # model allow explicit mle for checking
print(true.mle) # very close to optim output
print(solve(he(opt.out$par))) # inverse observed information matrix

Limitations of the package



torekleppe/RAutoDiff documentation built on Dec. 23, 2021, noon