In TheoMichelot/smoothSDE: Varying-coefficient stochastic differential equations

knitr::opts_chunk$set(
  message = FALSE, error = FALSE, warning = FALSE,
  comment = NA
)

library(smoothSDE)
library(ggplot2)
set.seed(342)

theme_set(
  theme_minimal()
)

The R package smoothSDE implements several widely-used stochastic differential equation (SDE) models (including Brownian motion with drift and the Ornstein-Uhlenbeck process). It provides functions for parameter estimation, model visualisation, uncertainty estimation, and residual analysis.

The package can be used to fit varying-coefficient SDEs, i.e., with time-varying parameters. The SDE parameters can be specified using flexible model formulas, including linear or non-linear dependencies on covariates, and random effects. In this vignette, we briefly describe some of the functionalities of the package, and illustrate its use using simulated and real-data examples.

Summary of SDEs implemented in smoothSDE

The main model formulations currently implemented in the package are Brownian motion, Ornstein-Uhlenbeck process, and continuous-time correlated random walk. Brownian motion and the Ornstein-Uhlenbeck process have a wide range of applications (e.g., ecology, finance), and the continuous-time correlated random walk is a popular model of animal movement described by @johnson2008. The package supports multivariate isotropic processes, i.e., with the same set of parameters describing the dynamics in each dimension. The SDEs and their parameters are summarised in the following table, and we describe the models in more details in the next few subsections.

\begin{tabular}{lllll} \toprule Model & Name & Parameters & & Link functions \ \midrule Brownian motion & \texttt{BM}'' & drift & $\mu \in (-\infty, +\infty)$ & identity \\ & & diffusion & $\sigma > 0$ & log \\ \midrule Ornstein-Uhlenbeck process &\texttt{OU}'' & mean & $\mu \in (-\infty, +\infty)$ & identity \ & & time scale & $\tau > 0$ & log \ & & variance & $\kappa > 0$ & log \ \midrule Continuous-time correlated RW & ``\texttt{CTCRW}'' & mean & $\mu \in (-\infty, +\infty)$ & identity \ & & reversion & $\beta > 0$ & log \ & & diffusion & $\sigma > 0$ & log \ \bottomrule \end{tabular}

Brownian motion

Brownian motion (with drift) is the process $(Z_t)$ defined as the solution to $$ dZ_t = \mu\ dt + \sigma\ dW_t, $$ where $\mu \in \mathbb{R}$ is the drift parameter (expected direction of change), and $\sigma > 0$ is the diffusion parameter (variability).

The approximate transition density of this model is $$ Z_{t_1} \vert { Z_{t_0} = z_0 } \sim N \left[ z_0 + \mu_{t_0} \Delta,\ \sigma_{t_0}^2 \Delta \right], $$ where $\Delta = t_1 - t_0$ is the time interval.

Figure \@ref(fig:sim-bm) shows simulated trajectories from 1-dimensional and 2-dimensional Brownian motion (with constant parameters).

n <- 500
times <- seq(0, 100, length = n)
dx <- rnorm(n)
x <- cumsum(dx)
dxy <- matrix(rnorm(2*n), nrow = n)
xy <- apply(dxy, 2, cumsum)

df1 <- data.frame(time = times, Z = x)
df2 <- data.frame(time = times, Z1 = xy[, 1], Z2 = xy[, 2])

ggplot(df1, aes(time, Z)) + geom_line()
ggplot(df2, aes(Z1, Z2)) + geom_path() + coord_equal()

Ornstein-Uhlenbeck

The Ornstein-Uhlenbeck (OU) process is usually described as the solution to $$ dZ_t = \beta (\mu - Z_t)\ dt + \sigma\ dt, $$ where $\mu \in \mathbb{R}$ is the long-term mean, $\beta > 0$ is the strength of reversion to the mean, and $\sigma > 0$ is the diffusion parameter.

In our experience, the parameterisation shown above suffers from identifiability issues. In smoothSDE, the varying-coefficient OU process is therefore specified as the solution to $$ dZ_t = \frac{1}{\tau} (\mu - Z_t)\ dt + \sqrt{\frac{2\kappa}{\tau}}\ dt, $$ where $\mu$ has the same interpretation as before, $\tau > 0$ measures the time scale of autocorrelation of the process, and $\kappa > 0$ is the long-term variance of the process. The following formulas can be used to go from one parameterisation to the other: $$ \begin{cases} \beta = \frac{1}{\tau} \ \sigma = \sqrt{\frac{2 \kappa}{\tau}} \end{cases} \Leftrightarrow \begin{cases} \tau = \frac{1}{\beta} \ \kappa = \frac{\sigma^2}{2\beta} \end{cases} $$

The approximate transition density of the varying-coefficient OU process is $$ Z_{t_1} \vert { Z_{t_0} = z_0 } \sim N \left[ e^{- \Delta / \tau} z_0 + (1 - e^{- \Delta / \tau}) \mu,\ \kappa (1 - e^{-2 \Delta / \tau}) \right]. $$ From this, we see that the stationary distribution of the process (obtained when $\Delta \rightarrow \infty$) is $$ Z_t \sim N(\mu,\ \kappa). $$

Simulations from the standard OU process (with constant parameters) are shown in Figure \@ref(fig:sim-ou) for illustration. The process oscillates around its long-term mean.

n <- 500
times <- seq(0, 100, length = n)
mu <- c(5, -3)
beta <- 0.1
sigma <- 1

Z1 <- rep(5, n)
Z2 <- rep(-3, n)
sd <- sigma/sqrt(2*beta) * sqrt(1 - exp(-2 * beta))
# Iterate through observations
for(i in 1:(n-1)) {
  mean1 <- exp(-beta) * Z1[i] + (1 - exp(-beta)) * mu[1]
  mean2 <- exp(-beta) * Z2[i] + (1 - exp(-beta)) * mu[2]
  Z1[i+1] <- rnorm(1, mean = mean1, sd = sd)
  Z2[i+1] <- rnorm(1, mean = mean2, sd = sd)
}

df1 <- data.frame(time = times, Z = Z1)
df2 <- data.frame(time = times, Z1 = Z1, Z2 = Z2)

ggplot(df1, aes(time, Z)) + geom_line() + geom_hline(yintercept = 5, lty = 2, col = "firebrick")
ggplot(df2, aes(Z1, Z2)) + geom_path() + coord_equal() + 
  geom_point(aes(Z1, Z2), data.frame(Z1 = 5, Z2 = -3), col = "firebrick", size = 2)

Continuous-time correlated random walk

The continuous-time correlated random walk (CTCRW), described by @johnson2008, is the process $(Z_t)$ solution to $$ \begin{cases} dZ_t = V_t\ dt \ dV_t = \beta (\mu - V_t)\ dt + \sigma\ dW_t. \end{cases} $$ In animal movement analyses, $Z_t$ usually denotes the location of the animal at time $t$, and $V_t$ is its velocity. The velocity $V_t$ is modelled with an OU process to capture autocorrelation in speed and direction of movement, and the location $Z_t$ is its integral, i.e.\ $Z_t = \int_{s = 0}^t V_s\ ds$. The parameters of the velocity process are $\mu$, the mean velocity (often fixed to zero), $\beta > 0$, the strength of reversion to the mean, and $\sigma > 0$, the variability of the velocity.

Because it is based on an OU process, the CTCRW model can also have identifiability issues for the parameters $\beta$ and $\sigma$. In smoothSDE, we use the following parameterisation, suggested by @gurarie2011: $$ \begin{cases} \tau = \frac{1}{\beta}\ \nu = \frac{\sqrt{\pi} \sigma}{2 \sqrt{\beta}} \end{cases} \Leftrightarrow \begin{cases} \beta = \frac{1}{\tau}\ \sigma = \frac{2\nu}{\sqrt{\pi \tau}} \end{cases} $$ where $\tau > 0$ is the time scale of autocorrelation of the velocity process, and $\nu > 0$ is the mean speed of the CTCRW process. These parameters have a clear interpretation; in particular, in animal tracking studies, they can be linked to the speed and sinuosity of movement, often used to learn about behaviour. (See, for example, the elephant example in Section 3.1 of @michelot2021.)

Simulated examples of the CTCRW process are shown in Figure \@ref(fig:sim-ctcrw). Compared to Brownian motion, realisations of the CTCRW are smooth and display persistence in the direction and speed of change.

## This uses a time discretization of the CTCRW process, first sampling from
## an OU process, and then simply deriving the locations as the cumulated
## sum of velocities. Exact simulation could be achieved using the joint
## multivariate normal transition density of the velocity and location, but
## the plots would be virtually identical.
n <- 500
times <- seq(0, 100, length = n)
beta <- 0.1
sigma <- 1

V1 <- rep(5, n)
V2 <- rep(-3, n)
sd <- sigma/sqrt(2*beta) * sqrt(1 - exp(-2 * beta))
# Iterate through observations
for(i in 1:(n-1)) {
  mean1 <- exp(-beta) * V1[i]
  mean2 <- exp(-beta) * V2[i]
  V1[i+1] <- rnorm(1, mean = mean1, sd = sd)
  V2[i+1] <- rnorm(1, mean = mean2, sd = sd)
}

Z1 <- cumsum(V1)
Z2 <- cumsum(V2)

df1 <- data.frame(time = times, Z = Z1)
df2 <- data.frame(time = times, Z1 = Z1, Z2 = Z2)

ggplot(df1, aes(time, Z)) + geom_line()
ggplot(df2, aes(Z1, Z2)) + geom_path() + coord_equal()

Varying-coefficient SDEs

The package smoothSDE implements varying-coefficient SDEs, as described by @michelot2021. The general idea is to allow for dependence of the SDE parameters on covariates through non-parametric and random effects. Consider varying-coefficient Brownian motion as an example; it is defined as the solution to $$ dZ_t = \mu_t\ dt + \sigma_t\ dW_t, $$ where the parameters $\mu_t$ and $\sigma_t$ are specified as $$ \begin{aligned} \mu_t = \beta_0^{(\mu)} + f_1^{(\mu)}(x_{1t}) + f_2^{(\mu)}(x_{2t}) + \dots \ \log(\sigma_t) = \beta_0^{(\sigma)} + f_1^{(\sigma)}(x_{1t}) + f_2^{(\sigma)}(x_{2t}) + \dots \end{aligned} $$ where, for each parameter, $\beta_0$ is an intercept parameter, and $f_j$ can be a linear, non-linear, or random effect of a covariate $x_{jt}$.

The same idea can be used for other SDEs, such as the OU and CTCRW processes. For a more extensive description of varying-coefficient SDEs, including details about model fitting, please refer to @michelot2021.

Workflow

The package is built around the R6 class SDE, which encapsulates the model specification (including formulas for SDE parameters) and the data for a varying-coefficient SDE. The typical workflow to create an SDE object and fit the model is as follows.

Choose the type of SDE; options are "BM" (Brownian motion), "OU" (Ornstein-Uhlenbeck), "CTCRW" (continuous-time correlated random walk).
Define model formulas for the parameters of the SDE. The formulas can include terms from standard R expressions, as well as smooth terms and random effects from the mgcv package (@wood2017).
Create a data frame with columns for the response variable, for the covariates, for time series ID (if several time series are provided), and for time.
Create an SDE model object, using the SDE$new constructor function. Its main arguments are formulas (list of formulas), data (data frame), type (character string for SDE type), and response (character string for response name).
Fit model using the fit method.
Get parameter estimates using par method, uncertainty estimates using CI_pointwise or CI_simultaneous methods, or visualise results using plot_par method.

For more information about the syntax for smoothSDE functions, consult the documentation (?SDE), or have a look at the example analyses presented in the following sections.

Example: Brownian motion with drift

We illustrate some of the functionalities of the package using simulated data for a varying-coefficient Brownian motion with drift. We first simulate a process with constant drift (mu) and time-varying diffusion parameter (sigma) over 1000 time steps.

n <- 1000
times <- 1:n
mu <- rep(0.1, n)
sigma <- exp(cos(2*pi*times/500))
dZ <- rnorm(n - 1, mean = mu[-n], sd = sigma[-n])
Z <- cumsum(c(0, dZ))

We define a data frame with columns ID (identifier for time series segment), Z (process of interest), and time (times of observation).

data <- data.frame(ID = 1,
                   Z = Z,
                   time = times)

ggplot(data, aes(time, Z)) + geom_line()

In this example, we are interested in estimating the smooth relationship between the diffusion parameter and time. We specify this dependence using a list of R formulas with one entry for each SDE parameter. Here, the diffusion parameter sigma is modelled with a thin-plate regression spline basis with 10 knots, using the mgcv syntax.

formulas <- list(mu = ~1, 
                 sigma = ~ s(time, k = 10, bs = "ts"))

We create an SDE object that encapsulates the formulas, the data, the type of model (here, Brownian motion), and the name of the response variable (here, Z). Then, we can fit the model using the method fit.

bm <- SDE$new(formulas = formulas,
               data = data,
               type = "BM", 
               response = "Z")

bm$fit()

Once the model is fitted, we can visualise the relationship between the SDE parameters and the covariates with plot_par. The option n_post can be used to visualise uncertainty with posterior samples.

bm$plot_par("time")

The smoothing splines captured the oscillating patterns in the diffusion parameter, and the drift parameter was slightly underestimated. We can also evaluate the SDE parameters on a grid of covariate values, to compare directly to the true parameters.

# Get point estimates and CIs for sigma over time
bm_par <- bm$par(t = "all")
bm_CI <- bm$CI_pointwise(t = "all")

# Data frame for point estimates
bm_par_df <- data.frame(time = times,
                        sigma = c(sigma, bm_par[, "sigma"]),
                        type = rep(c("true", "estimated"), each = n))

# Data frame for CIs
bm_ci_df <- data.frame(time = times,
                       low = bm_CI["sigma", "low",],
                       upp = bm_CI["sigma", "upp",])

ggplot(bm_par_df, aes(time, sigma, col = type)) +
  scale_color_manual(values = c("firebrick", "royalblue"), name = "") +
  geom_line(size = 1) +
  geom_line(aes(time, low), data = bm_ci_df, col = "firebrick", lty = 2) +
  geom_line(aes(time, upp), data = bm_ci_df, col = "firebrick", lty = 2)

We can simulate from the fitted model with the method simulate (only available for Brownian motion and Ornstein-Uhlenbeck models for now), for example to assess whether the model captures relevant features of the observed data.

# Pass 'data' to use observed covariates in simulation
sim <- bm$simulate(data = data)

ggplot(sim, aes(time, Z)) + geom_line()

Similarly to the observed time series, the simulated process alternates between periods of low variability (e.g., around $t = 250$ and $t = 750$) and periods of high variability (e.g., around $t = 0$, $t = 500$ and $t = 1000$).

Example: Ornstein-Uhlenbeck process

We present a second simulated example, based on a varying-coefficient Ornstein-Uhlenbeck (OU) process. We generate 1000 observations from a 2-dimensional isotropic OU process with constant mean (mu), constant mean reversion (beta), and time-varying diffusion (sigma). In this example, the parameters are selected such that the diffusion (and therefore the variance of the process around its mean) decreases in time.

# Mean parameter (centre of attraction)
mu <- matrix(rep(c(5, -5), each = n), ncol = 2)
# Time scale of autocorrelation
tau <- rep(2, n)
# Time-varying variance parameter
kappa <- plogis(-(times-500)/100)

# Simulate OU process over 1000 time steps
Z <- mu
for(i in 2:n) {
    mean <- exp(-1/tau[i-1]) * Z[i-1,] + (1 - exp(-1/tau[i-1])) * mu[i-1,]
    sd <- sqrt(kappa[i-1]) * sqrt(1 - exp(-2 / tau[i-1]))
    Z[i,] <- rnorm(2, mean, sd)
}

# Create data frame for smoothSDE
data <- data.frame(ID = 1,
                   Z1 = Z[,1],
                   Z2 = Z[,2],
                   time = times)

# Plot simulated data, coloured by time
ggplot(data, aes(Z1, Z2, col = time)) + 
    geom_path() + 
    coord_equal() +
    scale_color_viridis_c()

Like in the previous example, we specify the model for the OU parameters with a list of formulas. There are four parameters: mu1 (mean in first dimension), mu2 (mean in second dimension), tau (time scale of autocorrelation) and kappa (variance). In this example, we assume that the mean parameter is known, and we therefore use the option fixpar = c("mu1", "mu2") to indicate that it should not be estimated, and we add the argument par0 to give the known mean parameter (and initial values for the other parameters). We pass a vector of length 2 for response, to indicate that we want to model the two variables with a 2-dimensional isotropic OU process.

formulas <- list(mu1 = ~1,
                 mu2 = ~1,
                 tau = ~1,
                 kappa = ~s(time, k = 10, bs = "ts"))

par0 <- c(mu1 = 5, mu2 = -5, tau = 1, kappa = 3)

ou <- SDE$new(formulas = formulas, 
              data = data, 
              type = "OU", 
              response = c("Z1", "Z2"), 
              par0 = par0,
              fixpar = c("mu1", "mu2"))

ou$fit()

We can visualise the results to check that the estimated parameters correctly captured the true parameters used in the simulation.

# Plot tau and kappa against time
ou$plot_par("time", par_names = c("tau", "kappa"))

# Point estimates and CIs for kappa over time
ou_par <- ou$par(t = "all")
ou_CI <- ou$CI_pointwise(t = "all")

# Data frame for point estimates
ou_par_df <- data.frame(time = times,
                        kappa = c(kappa, ou_par[, "kappa"]),
                        type = rep(c("true", "estimated"), each = n))

# Data frame for CIs
ou_ci_df <- data.frame(time = times,
                       low = ou_CI["kappa", "low",],
                       upp = ou_CI["kappa", "upp",])

ggplot(ou_par_df, aes(time, kappa, col = type)) +
  scale_color_manual(values = c("firebrick", "royalblue"), name = "") +
  geom_line(size = 1) +
  geom_line(aes(time, low), data = ou_ci_df, col = "firebrick", lty = 2) +
  geom_line(aes(time, upp), data = ou_ci_df, col = "firebrick", lty = 2)

The decreasing variance parameter seems to be captured well by the model.

Example: elephant movement analysis

The following code can be used to fit the varying-coefficient CTCRW model to elephant GPS data from @wall2014, as described in Section 3.1 of @michelot2021.

We first download the data from the Movebank data repository, and keep the relevant rows and columns. Note that the CTCRW model requires projected Easting-Northing locations (rather than longitude-latitude), so that's what we are working with here.

# Load data and keep relevant columns
URL <- paste0("https://www.datarepository.movebank.org/bitstream/handle/",
              "10255/move.373/Elliptical%20Time-Density%20Model%20%28Wall%",
              "20et%20al.%202014%29%20African%20Elephant%20Dataset%20%",
              "28Source-Save%20the%20Elephants%29.csv")
raw <- read.csv(url(URL))
keep_cols <- c(11, 13, 14, 17, 4, 5, 6)
raw_cols <- raw[, keep_cols]
colnames(raw_cols) <- c("ID", "x", "y", "date", "lon", "lat", "temp")

# Only keep five months to eliminate seasonal effects
track <- subset(raw_cols, ID == unique(ID)[1])
dates <- as.POSIXlt(track$date, tz = "GMT")
times <- as.numeric(dates - min(dates))/3600
keep_rows <- which(dates > as.POSIXct("2009-05-01 00:00:00") & 
                     dates < as.POSIXct("2009-09-30 23:59:59"))
track <- track[keep_rows,]
dates <- dates[keep_rows]
times <- times[keep_rows]

# Convert to km
track$x <- track$x/1000
track$y <- track$y/1000

We create a data frame that includes columns for

time series identifier ID. This is required when fitting the model to several time series, treated as independent realisations of the same underlying process. Here, we only have one time series, so all rows have the same ID.
responses x and y. Here, there are two response variables because we are fitting an isotropic CTCRW model to the two-dimensional observations (Easting and Northing).
covariate temp. We will later estimate the effect of this covariate on the parameters of the elephant's velocity process.
time. This should be a numeric column for time, used to compute time intervals between observations.

data <- data.frame(ID = 1, 
                   x = track$x, 
                   y = track$y, 
                   temp = track$temp, 
                   time = times)

head(data)

In this analysis, we want to look into the effects of external temperature on the elephant's movement. We specify this by expressing the parameters of the CTCRW model (time scale $\tau$ and mean speed $\nu$) as smooth functions of the temperature. For these smooth terms, we use the syntax from the R package mgcv; here defining cubic regression splines with 10 basis functions and with shrinkage. See the mgcv documentation for more details.

``` {r ex3-form} formulas <- list(mu1 = ~1, mu2 = ~1, tau = ~ s(temp, k = 10, bs = "cs"), nu = ~ s(temp, k = 10, bs = "cs"))

Finally, we use the constructor of the SDE class to create an object for this model, passing as input the formulas, data, type of model, and name of response variables. We pass a vector of initial parameter values, from which the numerical optimisation of the likelihood will start for model fitting, and we use the argument `fixpar` to indicate that the mean velocity (i.e., parameters $\mu_1$ and $\mu_2$) should be fixed to zero rather than estimated.

``` {r ex3-create}
par0 <- c(mu1 = 0, mu2 = 0, tau = 1, nu = 1)

my_sde <- SDE$new(formulas = formulas, 
                  data = data, 
                  type = "CTCRW", 
                  response = c("x", "y"), 
                  fixpar = c("mu1", "mu2"))

Once the model has been created, the parameters can be estimated using the method fit:

``` {r ex3-fit} my_sde$fit()

Finally, the relationship between the SDE parameters and the covariates can be plotted with the method `plot_par`. The function can also generate posterior samples for the estimated relationship, to visualise uncertainty. Here, we plot the CTCRW parameters as function of temperature, with pointwise confidence intervals:

``` {r ex3-plot, fig.width = 8, fig.height = 3, out.width="100%", fig.align="center"}
my_sde$plot_par("temp", par_names = c("tau", "nu"), 
                show_CI = "pointwise", n_post = 1e3)

As described by @michelot2021, these results suggest that this elephant tended to move less at high temperatures, with a big drop in speed ($\nu_t$) over 40 degrees Celsius. Estimates and point-wise confidence intervals of the SDE parameters can also be computed for given covariate values, using the par and CI_pointwise methods.

``` {r ex3-predict}

Data frame with covariate values for prediction

new_data <- data.frame(temp = c(20, 30, 40))

Get estimates and CIs

par <- my_sde$par(t = "all", new_data = new_data) CI <- my_sde$CI_pointwise(new_data = new_data)

Estimates and 95% CIs for each temperature value given as input

par CI ```

References

TheoMichelot/smoothSDE documentation built on Jan. 26, 2024, 6:41 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TheoMichelot/smoothSDE
Varying-coefficient stochastic differential equations

In TheoMichelot/smoothSDE: Varying-coefficient stochastic differential equations

Summary of SDEs implemented in smoothSDE

Brownian motion

Ornstein-Uhlenbeck

Continuous-time correlated random walk

Varying-coefficient SDEs

Workflow

Example: Brownian motion with drift

Example: Ornstein-Uhlenbeck process

Example: elephant movement analysis

Data frame with covariate values for prediction

Get estimates and CIs

Estimates and 95% CIs for each temperature value given as input

References

R Package Documentation

Browse R Packages

We want your feedback!

TheoMichelot/smoothSDE Varying-coefficient stochastic differential equations

In TheoMichelot/smoothSDE: Varying-coefficient stochastic differential equations

Summary of SDEs implemented in smoothSDE

Brownian motion

Ornstein-Uhlenbeck

Continuous-time correlated random walk

Varying-coefficient SDEs

Workflow

Example: Brownian motion with drift

Example: Ornstein-Uhlenbeck process

Example: elephant movement analysis

Data frame with covariate values for prediction

Get estimates and CIs

Estimates and 95% CIs for each temperature value given as input

References

R Package Documentation

Browse R Packages

We want your feedback!

TheoMichelot/smoothSDE
Varying-coefficient stochastic differential equations