\newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\rv}[3][1]{#2_{#1},\ldots,#2_{#3}} \newcommand{\X}{\bm{X}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\dt}{\Delta t} \newcommand{\msd}{\mathrm{\scriptsize MSD}} \newcommand{\acf}{\mathrm{\scriptsize ACF}} \newcommand{\dX}{\Delta\X} \newcommand{\VH}{\bm{V}_H}

This vignette illustrates the basic functionality of the ** SuperGauss** package by simulating a few stochastic processes and estimating their parameters from regularly spaced data.

A one-dimensional fractional Brownian motion (fBM) $X_t = X(t)$ is a continuous Gaussian process with $E[X_t] = 0$ and $\cov(X_t, X_s) = \tfrac 1 2 (|t|^{2H} + |s|^{2H} - |t-s|^{2H})$, for $0 < H < 1$. fBM is not stationary but has stationary increments, such that $(X_{t+\dt} - X_t) \stackrel{D}{=} (X_{s+\dt} - X_s)$ for any $s,t$. As such, its covariance function is completely determined its mean squared displacement (MSD) $$ \msd_X(t) = E[(X_t - X_0)^2] = |t|^{2H}. $$ When the Hurst parameter $H = \tfrac 1 2$, fBM reduces to ordinary Brownian motion.

require(SuperGauss) N <- 3000 # number of observations dT <- 1/60 # time between observations (seconds) H <- .3 # Hurst parameter tseq <- (0:N)*dT # times at which to sample fBM npaths <- 5 # number of fBM paths to generate # to generate fbm, generate its increments, which are stationary msd <- fbm.msd(tseq = tseq[-1], H = H) acf <- msd2acf(msd = msd) # convert msd to acf # superfast method system.time({ dX <- rSnorm(n = npaths, acf = acf, fft = TRUE) }) # fast method (about 3x as slow) system.time({ rSnorm(n = npaths, acf = acf, fft = FALSE) }) # unstructured variance method (much slower) system.time({ matrix(rnorm(N*npaths), npaths, N) %*% chol(toeplitz(acf)) })

The following **R** code generates `r npaths`

independent fBM realizations of length $N = `r N`

$ with $H = `r H`

$. The timing of the "superfast" method [@wood.chan94] provided in this package is compared to that of a "fast" method [e.g., @brockwell.davis91] and to the usual method (Cholesky decomposition of an unstructured variance matrix).

# convert increments to position measurements Xt <- apply(rbind(0, dX), 2, cumsum) # plot clrs <- c("black", "red", "blue", "orange", "green2") par(mar = c(4.1,4.1,.5,.5)) plot(0, type = "n", xlim = range(tseq), ylim = range(Xt), xlab = "Time (s)", ylab = "Position (m)") for(ii in 1:npaths) { lines(tseq, Xt[,ii], col = clrs[ii], lwd = 2) }

Suppose that $\X = (\rv [0] X N)$ are equally spaced observations of an fBM process with $X_i = X(i \dt)$, and let $\dX = (\rv [0] {\Delta X} {N-1})$ denote the corresponding increments, $\Delta X_i = X_{i+1} - X_i$. Then the loglikelihood function for $H$ is
$$
\ell(H \mid \dX) = -\tfrac 1 2 \big(\dX' \VH^{-1} \dX + \log |\VH|\big),
$$
where $V_H$ is a Toeplitz matrix,
$$
\VH = [\cov(\Delta X_i, \Delta X_j)]*{0 \le i,j < N} = \begin{bmatrix} \gamma_0 & \gamma_1 & \cdots & \gamma*{N-1} \
\gamma_1 & \gamma_0 & \cdots & \gamma_{N-2} \
\vdots & \vdots & \ddots & \vdots \
\gamma_{N-1} & \gamma_{N-2} & \cdots & \gamma_0
\end{bmatrix}.
$$
Thus, each evaluation of the loglikelihood requires the inverse and log-determinant of a Toeplitz matrix, which scales as $\mathcal O(N^2)$ with the Durbin-Levinson algorithm. The ** SuperGauss** package implements an extended version of the Generalized Schur algorithm of @ammar.gragg88, which scales these computations as $\mathcal O(N \log^2 N)$. With careful memory management and extensive use of the

`FFTW`

`SuperGauss`

`Toeplitz`

matrix classThe bulk of the likelihood calculations in ** SuperGauss** are handled by the

`Toeplitz`

matrix class. A `Toeplitz`

object is created as follows:# allocate and assign in one step Toep <- Toeplitz(acf = acf) Toep # allocate memory only Toep <- Toeplitz(n = N) Toep Toep$setAcf(acf = acf) # assign later

Its primary methods are illustrated below:

all(acf == Toep$getAcf()) # extract acf # matrix multiplication z <- rnorm(N) x1 <- toeplitz(acf) %*% z # regular way x2 <- Toep %*% z # with Toeplitz class range(x1-x2) # system of equations y1 <- solve(toeplitz(acf), z) # regular way y2 <- solve(Toep, z) # with Toeplitz class range(y1-y2) # log-determinant ld1 <- determinant(toeplitz(acf))$mod ld2 <- determinant(Toep) # note: no $mod c(ld1, ld2)

The following code shows how to obtain the maximum likelihood of $H$ and its standard error for a given fBM path. For speed comparisons, the optimization is done both using the superfast Generalized Schur algorithm and the fast Durbin-Levinson algorithm.

dX <- diff(Xt[,1]) # obtain the increments of a given path N <- length(dX) # autocorrelation of fBM increments fbm.acf <- function(H) { msd <- fbm.msd(1:N*dT, H = H) msd2acf(msd) } # loglikelihood using generalized Schur algorithm Toep <- Toeplitz(n = N) # pre-allocate memory loglik.GS <- function(H) { Toep$setAcf(acf = fbm.acf(H)) dSnorm(X = dX, acf = Toep, log = TRUE) } # loglikelihood using Durbin-Levinson algorithm loglik.DL <- function(H) { dSnormDL(X = dX, acf = fbm.acf(H), log = TRUE) } # superfast method system.time({ GS.mle <- optimize(loglik.GS, interval = c(.01, .99), maximum = TRUE) }) # fast method (about 10x slower) system.time({ DL.mle <- optimize(loglik.DL, interval = c(.01, .99), maximum = TRUE) }) c(GS = GS.mle$max, DL = DL.mle$max) # standard error calculation require(numDeriv) Hmle <- GS.mle$max Hse <- -hessian(func = loglik.GS, x = Hmle) # observed Fisher Information Hse <- sqrt(1/Hse[1]) c(mle = Hmle, se = Hse)

`ReferenceClasses`

In order to effectively manage memory in the underlying **C++** code, the `Toeplitz`

class is implemented using **R**'s "Reference Classes". Among other things, this means that when a `Toeplitz`

object is passed to a function, the function does not make a copy of it: all modifications to the object inside the object are reflected on the object outside the function as well, as in the following example:

T1 <- Toeplitz(n = N) T2 <- T1 # shallow copy: both of these point to the same memory location # affects both objects T1$setAcf(fbm.acf(.5)) T1 T2 loglik <- function(H) { T1$setAcf(acf = fbm.acf(H)) dSnorm(X = dX, acf = T1, log = TRUE) } # affects both objects loglik(H = .3) T1 T2

In addition to the superfast algorithm for Gaussian likelihood evaluations , ** SuperGauss** provides such algorithms for the loglikelihood gradient and Hessian functions, leading to superfast versions of many inference algorithms such as Newton-Raphson and Hamiltonian Monte Carlo. An example of the former is given below using the two-parameter exponential autocorrelation model
$$
\acf_X(t \mid \lambda, \sigma) = \sigma^2 \exp(- |t/\lambda|).
$$

# autocorrelation function exp.acf <- function(t, lambda, sigma) sigma^2 * exp(-abs(t/lambda)) # gradient, returned as a 2-column matrix exp.acf.grad <- function(t, lambda, sigma) { ea <- exp.acf(t, lambda, 1) cbind(abs(t)*(sigma/lambda)^2 * ea, # d_acf/d_lambda 2*sigma * ea) # d_acf/d_sigma } # Hessian, returned as an array of size length(t) x 2 x 2 exp.acf.hess <- function(t, lambda, sigma) { ea <- exp.acf(t, lambda, 1) sl2 <- sigma/lambda^2 hess <- array(NA, dim = c(length(t), 2, 2)) hess[,1,1] <- sl2^2*(t^2 - 2*abs(t)*lambda) * ea # d2_acf/d_lambda^2 hess[,1,2] <- 2*sl2 * abs(t) * ea # d2_acf/(d_lambda d_sigma) hess[,2,1] <- hess[,1,2] # d2_acf/(d_sigma d_lambda) hess[,2,2] <- 2 * ea # d2_acf/d_sigma^2 hess } # simulate data lambda <- runif(1, .5, 2) sigma <- runif(1, .5, 2) tseq <- (1:N-1)*dT acf <- exp.acf(t = tseq, lambda = lambda, sigma = sigma) Xt <- rSnorm(acf = acf) Toep <- Toeplitz(n = N) # storage space # negative loglikelihood function of theta = (lambda, sigma) # include attributes for gradient and Hessian exp.negloglik <- function(theta) { lambda <- theta[1] sigma <- theta[2] # acf, its gradient, and Hessian Toep$setAcf(acf = exp.acf(tseq, lambda, sigma)) # use the Toeplitz class dacf <- exp.acf.grad(tseq, lambda, sigma) d2acf <- exp.acf.hess(tseq, lambda, sigma) nll <- -1 * dSnorm(X = Xt, acf = Toep, log = TRUE) attr(nll, "gradient") <- -1 * Snorm.grad(X = Xt, acf = Toep, dacf = dacf) attr(nll, "hessian") <- -1 * Snorm.hess(X = Xt, acf = Toep, dacf = dacf, d2acf = d2acf) nll } # optimization system.time({ mle.fit <- nlm(f = exp.negloglik, p = c(1,1), hessian = TRUE) }) # display estimates with standard errors rbind(true = c(lambda = lambda, sigma = sigma), est = mle.fit$estimate, se = sqrt(diag(solve(mle.fit$hessian))))

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.