knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(nabla)
D operatorThe D operator composes to arbitrary order. D(f, x) gives the first
derivative; D(f, x, order = 2) gives the second; D(f, x, order = 3)
gives the third-order tensor; and so on:
f <- function(x) x[1]^3 D(f, 2) # f'(2) = 3*4 = 12 D(f, 2, order = 2) # f''(2) = 6*2 = 12 D(f, 2, order = 3) # f'''(2) = 6
For multi-parameter functions, each application of D appends one dimension
to the output. A scalar function $f: \mathbb{R}^n \to \mathbb{R}$ produces
a gradient vector ($n$), then a Hessian matrix ($n \times n$), then a
third-order tensor ($n \times n \times n$):
# Gradient of a 2-parameter function f <- function(x) x[1]^2 * x[2] D(f, c(3, 4)) # gradient: c(24, 9) # Hessian via D^2 D(f, c(3, 4), order = 2) # Composition works identically Df <- D(f) DDf <- D(Df) DDf(c(3, 4))
For vector-valued functions, D produces the Jacobian:
g <- function(x) list(x[1] * x[2], x[1]^2 + x[2]) D(g, c(3, 4)) # 2x2 Jacobian: [[4, 3], [6, 1]]
The convenience functions gradient(), hessian(), and jacobian() are
thin wrappers around D:
gradient(f, c(3, 4)) # == D(f, c(3, 4)) hessian(f, c(3, 4)) # == D(f, c(3, 4), order = 2) jacobian(g, c(3, 4)) # == D(g, c(3, 4))
The curvature of a curve $y = f(x)$ is:
$$\kappa = \frac{|f''(x)|}{(1 + f'(x)^2)^{3/2}}$$
With D(), we can compute this directly:
curvature <- function(f, x) { d1 <- D(f, x) d2 <- D(f, x, order = 2) abs(d2) / (1 + d1^2)^(3/2) } # Curvature of sin(x) at various points # Wrap sin for D()'s x[1] convention sin_f <- function(x) sin(x[1]) xs <- seq(0, 2 * pi, length.out = 7) kappas <- sapply(xs, function(x) curvature(sin_f, x)) data.frame(x = round(xs, 3), curvature = round(kappas, 6))
The table reveals the geometry of $\sin(x)$: curvature is zero at inflection points (integer multiples of $\pi$, where $\sin''(x) = 0$) and peaks at $\pi/2$ and $3\pi/2$ where $\sin$ reaches its extrema. The maximum curvature of 1.0 reflects the unit amplitude — for $A\sin(x)$ the maximum curvature would be $A$.
At $x = \pi/2$, $\sin$ has maximum curvature because $|\sin''(\pi/2)| = 1$ and $\sin'(\pi/2) = 0$:
curvature(sin_f, pi / 2) # should be 1.0
Curvature peaks where the second derivative is largest in magnitude and the first derivative is small. For $\sin(x)$, this occurs at $\pi/2$ and $3\pi/2$:
xs_curve <- seq(0, 2 * pi, length.out = 200) sin_vals <- sin(xs_curve) kappa_vals <- sapply(xs_curve, function(x) curvature(sin_f, x)) oldpar <- par(mar = c(4, 4.5, 2, 4.5)) plot(xs_curve, sin_vals, type = "l", col = "steelblue", lwd = 2, xlab = "x", ylab = "sin(x)", main = "sin(x) and its curvature") par(new = TRUE) plot(xs_curve, kappa_vals, type = "l", col = "firebrick", lwd = 2, lty = 2, axes = FALSE, xlab = "", ylab = "") axis(4, col.axis = "firebrick") mtext(expression(kappa(x)), side = 4, line = 2.5, col = "firebrick") abline(v = c(pi / 2, 3 * pi / 2), col = "grey60", lty = 3) legend("bottom", legend = c("sin(x)", expression(kappa(x))), col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2, bty = "n", horiz = TRUE) par(oldpar)
The curvature reaches its maximum of 1.0 at $x = \pi/2$ (where $\sin'(x) = 0$ and $|\sin''(x)| = 1$) and returns to zero at integer multiples of $\pi$ (where $\sin''(x) = 0$).
A second-order Taylor approximation of $f$ around $x_0$:
$$f(x) \approx f(x_0) + f'(x_0)(x - x_0) + \frac{1}{2} f''(x_0)(x - x_0)^2$$
D() computes this directly:
taylor2 <- function(f, x0, x) { f0 <- f(x0) f1 <- D(f, x0) f2 <- D(f, x0, order = 2) f0 + f1 * (x - x0) + 0.5 * f2 * (x - x0)^2 } # Approximate exp(x) near x = 0 f_exp <- function(x) exp(x[1]) xs <- c(-0.1, -0.01, 0, 0.01, 0.1) data.frame( x = xs, exact = exp(xs), taylor2 = sapply(xs, function(x) taylor2(f_exp, 0, x)), error = exp(xs) - sapply(xs, function(x) taylor2(f_exp, 0, x)) )
Near the expansion point, the approximation is accurate. The error grows as $O(|x - x_0|^3)$ because the Taylor remainder is $\frac{f'''(\xi)}{6}(x - x_0)^3$. For $\exp(x)$ around $x_0 = 0$, $f'''(\xi) = \exp(\xi) \approx 1$, so the error at $x = 0.1$ is approximately $0.1^3/6 \approx 1.7 \times 10^{-4}$ and at $x = 0.01$ it drops to $\approx 1.7 \times 10^{-7}$ — a factor-of-1000 reduction for a factor-of-10 decrease in distance, confirming cubic convergence:
xs_plot <- seq(-2, 3, length.out = 300) exact_vals <- exp(xs_plot) taylor_vals <- sapply(xs_plot, function(x) taylor2(f_exp, 0, x)) oldpar <- par(mar = c(4, 4, 2, 1)) plot(xs_plot, exact_vals, type = "l", col = "steelblue", lwd = 2, xlab = "x", ylab = "y", main = expression("exp(x) vs 2nd-order Taylor at " * x[0] == 0), ylim = c(-1, 12)) lines(xs_plot, taylor_vals, col = "firebrick", lwd = 2, lty = 2) abline(v = 0, col = "grey60", lty = 3) points(0, 1, pch = 19, col = "grey40", cex = 1.2) legend("topleft", legend = c("exp(x)", expression("Taylor " * T[2](x))), col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2, bty = "n") par(oldpar)
The Taylor polynomial matches exp(x) closely near $x_0 = 0$ but diverges
at larger distances, illustrating the local nature of polynomial approximation.
hessian() computes the exact second-derivative matrix at any point. For
maximum likelihood estimation, this gives the observed information matrix
$J(\hat\theta) = -H(\hat\theta)$, whose inverse is the asymptotic
variance-covariance matrix. Exact Hessians yield exact standard errors and
confidence intervals — the primary reason to use AD in statistical
inference.
Consider a Poisson log-likelihood for $\lambda$:
set.seed(123) data_pois <- rpois(50, lambda = 3) n <- length(data_pois) sum_x <- sum(data_pois) sum_lfact <- sum(lfactorial(data_pois)) ll_poisson <- function(theta) { lambda <- theta[1] sum_x * log(lambda) - n * lambda - sum_lfact } lambda0 <- 2.5
Primary approach: Using hessian():
hess_helper <- hessian(ll_poisson, lambda0) hess_helper
The observed information is simply -hess_helper, and the standard error
is 1 / sqrt(-hess_helper):
se <- 1 / sqrt(-hess_helper[1, 1]) cat("SE(lambda) at lambda =", lambda0, ":", se, "\n")
Verification: Manual construction of a second-order dual number — the
same arithmetic that hessian() performs internally:
# Build a dual_variable_n wrapped in a dual_vector manual_theta <- dual_vector(list(dual_variable_n(lambda0, 2))) result_manual <- ll_poisson(manual_theta) manual_hess <- deriv(deriv(result_manual)) manual_hess
Both approaches yield the same result:
hess_helper[1, 1] - manual_hess # ~0
Under the hood, D(f, x, order = 2) uses nested duals — a dual number
whose components are themselves dual numbers. The structure
dual(dual(x, 1), dual(1, 0)) simultaneously tracks:
Nested duals can be constructed directly with dual_variable_n():
x <- dual_variable_n(2, order = 2) # Evaluate x^3 result <- x^3 # Extract all three quantities deriv_n(result, 0) # f(2) = 8 deriv_n(result, 1) # f'(2) = 3*4 = 12 deriv_n(result, 2) # f''(2) = 6*2 = 12
For constants in higher-order computations, use dual_constant_n():
k <- dual_constant_n(5, order = 2) deriv_n(k, 0) # 5 deriv_n(k, 1) # 0 deriv_n(k, 2) # 0
The differentiate_n() helper wraps the construction and extraction:
# Differentiate sin(x) at x = pi/4 result <- differentiate_n(sin, pi / 4, order = 2) result$value # sin(pi/4) result$d1 # cos(pi/4) = f' result$d2 # -sin(pi/4) = f''
Verify against known values:
result$value - sin(pi / 4) # ~0 result$d1 - cos(pi / 4) # ~0 result$d2 - (-sin(pi / 4)) # ~0
More complex functions work too:
f <- function(x) x * exp(-x^2) d2 <- differentiate_n(f, 1, order = 2) # Analytical: f'(x) = exp(-x^2)(1 - 2x^2) # f''(x) = exp(-x^2)(-6x + 4x^3) analytical_f1 <- exp(-1) * (1 - 2) analytical_f2 <- exp(-1) * (-6 + 4) d2$d1 - analytical_f1 # ~0 d2$d2 - analytical_f2 # ~0
These low-level functions are useful for understanding how nabla works
internally, but for most applications, D(), gradient(), and hessian()
are the recommended interface.
The D operator provides exact higher-order derivatives through a single
composable interface — no symbolic differentiation, no finite-difference
grid, and no loss of precision:
D(f, x, order = 2) gives
the exact second derivative needed for curvature analysis, Taylor
expansion, and Newton-Raphson optimization.hessian() directly
yields the observed information matrix. Its inverse gives the asymptotic
variance-covariance matrix — the foundation of confidence intervals and
hypothesis tests in likelihood-based inference.D(f, x, order = 3) yields third-order tensors for
asymptotic skewness analysis (see vignette("mle-skewness")); higher
orders work the same way.D constructs nested dual
numbers at each order level. The same arithmetic that propagates first
derivatives recursively propagates second, third, and higher derivatives.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.