knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette introduces the basic functionality of the RAutoDiff package. The package provides first- and second order forward mode automatic differentiation of simple algorithms written in R. The package is not very fast, and is therefore mainly intended for prototyping algorithms that later will be implemented in a compiled language. However, it should also be applicable for smaller problems.
Notice: so far, only a small subset of all the available functions in R have been overloaded to work with the the package. An overview of the available functions will be maintained in a separate vignette.
The typical application of the package consist of the following four steps:
library(RAutoDiff)
independent()
(for first order derivatives only) or independent2()
(first and second order derivatives) to indicate which variables we are differentiating with respect to.independent*()
as input.Below is a very simple example where we differentiate the function $f(x) = x^2/2$ which of course has derivatives $f'(x) = x$ and $f''(x) = 1$, and evaluate these for $x=3$ ($f(3) = 9/2$, $f'(3)=3$, $f''(3)=1$):
x <- 3.0 # value where we wish to evaluate function x.a <- independent2(x) # set independent value (allowing for first and second derivatives) f.a <- 0.5*x.a^2 # evaluate function (note using output of independent*() as input) print(paste0("value of function: ",value(f.a))) # extract value print(paste0("gradient of function: ",gradient(f.a))) # extract gradient print(paste0("hessian of function: ",hessian(f.a)))
I.e. indeed as predicted above...
Now consider the derivative of the quadratic form $f(x) = \frac{1}{2} x^T P x$ where $P$ is some symmetric positive definite matrix. Again, this problem allows simple expressions for first and second order derivatives, namely $\nabla f(x) = P x$ and $\nabla^2 f(x) = P$.
Let's replicate using AD:
P <- matrix(c(1,2,2,5),nrow=2) # 2x2 matrix x <- c(1,2) # where to evaluate x.a <- independent2(x) f.a <- 0.5*x.a%*%P%*%x.a print(paste0("value : ",value(f.a), " should be ",0.5*x%*%P%*%x)) print("gradient:") print(gradient(f.a)) print("should be:") print(P%*%x) print("hessian:") print(hessian(f.a)) print("should be:") print(P)
Notice that the basic AD types are vectors.
The package can also be used for functions/algorithms with multivariate
output. To extract the Jacobian, use the function jacobian()
. To obtain
a Hessian of each output component, apply the hessian()
function to that
component:
x.a <- independent2(c(-1,0,1)) vec <- c(0.5*sum(x.a^2),sum(x.a),prod(x.a)) print(jacobian(vec)) print(hessian(vec[1])) # notice hessian of first element in vec!
The AD types can also be used within functions, e.g.
fun <- function(x){ # define some function ret <- 0.5*sum(x^2) return(ret) } x <- c(-1,0,1,2) # where to evaluate f <- fun(x) # evaluate with regular (numeric) argument print(f) f.d <- fun(independent(x)) # first order AD print(f.d) f.dd <- fun(independent2(x)) # first and second order AD print(f.dd)
So far, the package has some basic matrix/linear algebra capabilities where matrices are created using overloaded versions of matrix()
or diag()
:
vals <- independent(c(1,-1,2,2)) # indpendent variables m <- matrix(vals,2,2) # creates a AD_matrix print(m) f <- solve(m,c(3,-4)) # overload of solve function print(f) # m^{-1} %*% c(3,-4) print(m%*%f) # should be constant c(3,-4) print((m%*%f)@vals) # display the AD vector used for storing values of the matrix
The solve function is based on a basic implementation of the Crout LU decomposition (without pivoting), and should only be used for small, well-scaled problems
Other functions for AD matrices so far implemented: length()
, nrow()
, ncol()
, diag()
, rowSums()
, colSums()
, sum()
, t()
, backsolve()
, forwardsolve()
, det()
, determinant()
.
The package also implements a Cholesky factorization for symmetric positive definite matrices which is much more stable than the LU-decomposition. The Cholesky factorization comes with a set of solve routines named solve.chol()
:
A <- diag(independent(c(3,4,5))) + matrix(1,3,3) #SPD matrix print(A) U <- chol(A) print(U) print(chol(value(A))) # same computation in numeric types print(A%*%solve.chol(A,diag(A))) # should return diag A
The Cholesky routines are all based on the basic lower triangular Cholesky factorization available (including an overload with numeric types) as cholL()
. Use this one for best performance; e.g. log-determinant of
matrix A
above:
log.det <- 2.0*sum(log(diag(cholL(A)))) print(log.det) # log-determinant and its Jacobian print(determinant(value(A),logarithm = TRUE)[1]) # same in numeric types
One of the reasons we are interested in derivatives is of course for application in numerical optimization. Below is an example of optimization of the negative log-likelihood for a simple $N(\mu,\exp(2\lambda))$, $\theta=(\mu,\lambda)$ model with iid observations:
set.seed(1) y <- rnorm(10) # simulate some iid N(0,1) data n.log.like <- function(theta){ #theta = c(mu,lambda) mu <- theta[1] sigma <- exp(theta[2]) return(-sum(dnorm(y,mu,sigma,log=TRUE))) # negative log likelihood } gr <- function(theta){ # gradient aka negative score return(gradient(n.log.like(independent(theta)))) } he <- function(theta){ # negative hessian return(hessian(n.log.like(independent2(theta)))) } theta0 <- c(0,0) # intial point for optimization opt.out <- optim(par=theta0,fn=n.log.like,gr=gr,method = "BFGS") print(opt.out) #i.e. success! print(gr(opt.out$par)) # gradient at (numerical) maximizer (essentially zero) true.mle <- c(mean(y),0.5*log(mean((y-mean(y))^2))) # model allow explicit mle for checking print(true.mle) # very close to optim output print(solve(he(opt.out$par))) # inverse observed information matrix
qnorm(independent(0))
fails until this function has been overloaded in the package.c()
function for cases where the first argument is not a AD type. E.g. c(independent(1),2)
works, but c(2,independent(1))
fails. Any help on this most welcome! Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.