knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(nabla)
nabla differentiates any R function exactly — no symbolic algebra, no
finite-difference grid, no loss of precision. Pass a function and a point
to D(), and you get the exact derivative:
f <- function(x) x[1]^2 + sin(x[1]) D(f, pi / 4) # exact first derivative at pi/4
Functions passed to D() use x[1], x[2], ... indexing — the same
convention as R's optim(). For multi-parameter functions, gradient(),
hessian(), and jacobian() provide the standard objects of calculus:
g <- function(x) x[1]^2 * exp(x[2]) gradient(g, c(2, 0)) # c(4, 4) hessian(g, c(2, 0)) # 2x2 Hessian matrix
The D operator is the core of nabla. Applied once, it gives the first
derivative. Applied twice (via order = 2), it gives the second derivative:
f <- function(x) x[1]^2 + sin(x[1]) # First derivative: f'(x) = 2x + cos(x) D(f, pi / 4) # Verify against the analytical formula 2 * (pi / 4) + cos(pi / 4) # Second derivative: f''(x) = 2 - sin(x) D(f, pi / 4, order = 2)
gradient() is equivalent to D(f, x) for scalar-valued functions:
gradient(f, pi / 4)
The function and its AD-computed derivative over $[0, 2\pi]$, with the evaluation point $x = \pi/4$ marked:
f <- function(x) x[1]^2 + sin(x[1]) xs <- seq(0, 2 * pi, length.out = 200) # Compute f(x) and f'(x) at each grid point using D() fx <- sapply(xs, function(xi) f(xi)) fpx <- sapply(xs, function(xi) D(f, xi)) # Mark the evaluation point x = pi/4 x_mark <- pi / 4 oldpar <- par(mar = c(4, 4, 2, 1)) plot(xs, fx, type = "l", col = "steelblue", lwd = 2, xlab = "x", ylab = "y", main = expression(f(x) == x^2 + sin(x) ~ "and its derivative"), ylim = range(c(fx, fpx))) lines(xs, fpx, col = "firebrick", lwd = 2, lty = 2) points(x_mark, f(x_mark), pch = 19, col = "steelblue", cex = 1.3) points(x_mark, D(f, x_mark), pch = 19, col = "firebrick", cex = 1.3) abline(v = x_mark, col = "grey60", lty = 3) legend("topleft", legend = c("f(x)", "f'(x) via AD"), col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2, bty = "n") par(oldpar)
D() handles control flow, loops, and higher-order functions transparently.
No special annotation is needed — just write ordinary R:
# if/else branching safe_log <- function(x) { if (x[1] > 0) log(x[1]) else -Inf } D(safe_log, 2) # 1/2 = 0.5 D(safe_log, -1) # 0 (constant branch) # for loop poly <- function(x) { result <- 0 for (i in 1:5) result <- result + x[1]^i result } D(poly, 2) # 1 + 2*2 + 3*4 + 4*8 + 5*16 = 129 # Reduce f_reduce <- function(x) Reduce("+", lapply(1:4, function(i) x[1]^i)) D(f_reduce, 2)
More complex compositions also propagate correctly:
g <- function(x) exp(-x[1]^2 / 2) / sqrt(2 * pi) # standard normal PDF # D(g, 1) should equal -x * dnorm(x) at x = 1 D(g, 1) -1 * dnorm(1) D(g, 1) - (-1 * dnorm(1)) # ~0
For functions of several variables, gradient() returns the gradient vector,
hessian() returns the Hessian matrix, and jacobian() handles
vector-valued functions:
# Gradient of a 2-parameter function f2 <- function(x) x[1]^2 * x[2] gradient(f2, c(3, 4)) # c(2*3*4, 3^2) = c(24, 9) # Hessian hessian(f2, c(3, 4)) # Jacobian of a vector-valued function f_vec <- function(x) list(x[1] * x[2], x[1]^2 + x[2]) jacobian(f_vec, c(3, 4))
The D operator composes for higher-order derivatives. D(f, x, order = 2)
returns the Hessian; D(f, x, order = 3) returns a third-order tensor:
f2 <- function(x) x[1]^2 * x[2] D(f2, c(3, 4), order = 2) # Hessian D(f2, c(3, 4), order = 3) # 2x2x2 third-order tensor
Three methods for computing the derivative of $f(x) = x^3 \sin(x)$ at $x = 2$:
f <- function(x) x[1]^3 * sin(x[1]) f_num <- function(x) x^3 * sin(x) # plain numeric version x0 <- 2 # 1. Analytical: f'(x) = 3x^2 sin(x) + x^3 cos(x) analytical <- 3 * x0^2 * sin(x0) + x0^3 * cos(x0) # 2. Finite differences h <- 1e-8 finite_diff <- (f_num(x0 + h) - f_num(x0 - h)) / (2 * h) # 3. Automatic differentiation ad_result <- D(f, x0) # Compare data.frame( method = c("Analytical", "Finite Diff", "AD (nabla)"), derivative = c(analytical, finite_diff, ad_result), error_vs_analytical = c(0, finite_diff - analytical, ad_result - analytical) )
The AD result matches the analytical derivative to machine precision — zero error — because dual-number arithmetic propagates exact derivative rules through every operation. Finite differences, by contrast, introduce truncation error of order $O(h^2)$ for central differences; here the error is roughly $10^{-7}$, reflecting the choice of $h = 10^{-8}$ (the optimal step size for double-precision central differences is approximately $\epsilon_{\mathrm{mach}}^{1/3} \approx 6 \times 10^{-6}$, so our step is in the right ballpark). A bar chart makes this difference vivid:
errors <- abs(c(finite_diff - analytical, ad_result - analytical)) # Use log10 scale; clamp AD error to .Machine$double.eps if exactly zero errors[errors == 0] <- .Machine$double.eps oldpar <- par(mar = c(4, 5, 2, 1)) bp <- barplot(log10(errors), names.arg = c("Finite Diff", "AD (nabla)"), col = c("coral", "steelblue"), ylab = expression(log[10] ~ "|error|"), main = "Absolute error vs analytical derivative", ylim = c(-16, 0), border = NA) abline(h = log10(.Machine$double.eps), lty = 2, col = "grey40") text(mean(bp), log10(.Machine$double.eps) + 0.8, "machine epsilon", cex = 0.8, col = "grey40") par(oldpar)
This advantage becomes critical for optimization algorithms that depend on accurate gradients.
Any optimizer can find the MLE — the point estimate is the easy part. What
nabla gives you is the exact Hessian at the MLE, and that's where the
real statistical payoff lies.
The observed information matrix $J(\hat\theta) = -H(\hat\theta)$ is the negative Hessian of the log-likelihood at the MLE. Its inverse gives the asymptotic variance-covariance matrix:
$$\text{Var}(\hat\theta) \approx J(\hat\theta)^{-1}$$
Inaccurate Hessians mean inaccurate standard errors and confidence intervals.
Finite-difference Hessians introduce $O(h^2)$ error that propagates directly
into your CIs. With nabla, the Hessian is exact to machine precision —
so your standard errors and confidence intervals are as accurate as the
asymptotic theory allows.
Higher-order derivatives push further: D(f, x, order = 3) yields the
third-order derivative tensor, which quantifies the asymptotic skewness
of the MLE's sampling distribution — a correction that standard Wald
intervals miss entirely.
Under the hood, nabla uses dual numbers — values of the form
$a + b\varepsilon$, where $\varepsilon^2 = 0$. Evaluating a function
$f$ on a dual number $x + \varepsilon$ yields:
$$f(x + \varepsilon) = f(x) + f'(x)\,\varepsilon$$
The "value" part gives $f(x)$ and the "epsilon" part gives $f'(x)$, computed automatically by propagating $\varepsilon$ through every arithmetic operation. This is forward-mode automatic differentiation (AD).
The low-level constructors expose dual numbers directly. In this style,
functions use bare x (not x[1]) because they receive a single dualr
object rather than a vector:
# A dual variable: value = 3, derivative seed = 1 x <- dual_variable(3) value(x) deriv(x) # A dual constant: value = 5, derivative seed = 0 k <- dual_constant(5) value(k) deriv(k) # Explicit constructor y <- dual(2, 1) value(y) deriv(y)
Setting deriv = 1 means "I'm differentiating with respect to this variable."
Setting deriv = 0 means "this is a constant."
All standard arithmetic operators propagate derivatives:
x <- dual_variable(3) # Addition: d/dx(x + 2) = 1 r_add <- x + 2 value(r_add) deriv(r_add) # Subtraction: d/dx(5 - x) = -1 r_sub <- 5 - x value(r_sub) deriv(r_sub) # Multiplication: d/dx(x * 4) = 4 r_mul <- x * 4 value(r_mul) deriv(r_mul) # Division: d/dx(1/x) = -1/x^2 = -1/9 r_div <- 1 / x value(r_div) deriv(r_div) # Power: d/dx(x^3) = 3*x^2 = 27 r_pow <- x^3 value(r_pow) deriv(r_pow)
All standard mathematical functions are supported, with derivatives computed via the chain rule:
x <- dual_variable(1) # exp: d/dx exp(x) = exp(x) r_exp <- exp(x) value(r_exp) deriv(r_exp) # log: d/dx log(x) = 1/x r_log <- log(x) value(r_log) deriv(r_log) # sin: d/dx sin(x) = cos(x) x2 <- dual_variable(pi / 4) r_sin <- sin(x2) value(r_sin) deriv(r_sin) # cos(pi/4) # sqrt: d/dx sqrt(x) = 1/(2*sqrt(x)) x3 <- dual_variable(4) r_sqrt <- sqrt(x3) value(r_sqrt) deriv(r_sqrt) # 1/(2*2) = 0.25 # Gamma-related x4 <- dual_variable(3) r_lgamma <- lgamma(x4) value(r_lgamma) # log(2!) = log(2) deriv(r_lgamma) # digamma(3)
Dual numbers integrate with standard R patterns:
# sum() works a <- dual_variable(2) b <- dual_constant(3) total <- sum(a, b, dual_constant(1)) value(total) # 6 deriv(total) # 1 (only a has deriv = 1) # prod() works p <- prod(a, dual_constant(3)) value(p) # 6 deriv(p) # 3 (product rule: 3*1 + 2*0) # c() creates a dual_vector v <- c(a, b) length(v) # is.numeric returns TRUE (for compatibility) is.numeric(dual_variable(1))
In most cases you won't need these constructors directly — D(),
gradient(), hessian(), and jacobian() handle dual number creation
internally. But understanding the mechanism helps when debugging or
building custom AD workflows.
This vignette introduced D() and showed that AD computes exact derivatives
where finite differences cannot. For practical applications:
vignette("mle-workflow") applies gradient(), hessian(), and the
observed information matrix to five increasingly complex MLE problems, from
Normal to logistic regression, and builds a Newton-Raphson optimizer
powered entirely by AD.vignette("higher-order") demonstrates D(f, x, order = 2) for
second derivatives, with applications to curvature analysis, Taylor
expansion, and exact Hessians for MLE standard errors.vignette("optimizer-integration") shows how to plug nabla
gradients into R's built-in optimizers (optim, nlminb) for
production-quality MLE fitting.vignette("mle-skewness") uses D(f, x, order = 3) to compute
the asymptotic skewness of MLEs — a third-order analysis that goes
beyond standard Wald confidence intervals.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.