In mlysy/msde: Bayesian Inference for Multivariate Stochastic Differential Equations

\newcommand{\Ep}{\mathrm{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\idm}{\mathrm{I}} \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\N}{\mathcal{N}}

This vignette shows how to realize particle MCMC for multivariate SDEs with msde.

Introduction

Parameter estimation of multivariate stochastic differential equations is a very meaningful but usually difficult task. If we translate a given stochastic differential equation into a non-linear state-space model (SSM), then Particle Markov chain Monte Carlo methods are the state-of-the-art technique that can be applied to make Bayesian inference feasible in such a situation.

For tutorial purposes, we will not delve into the theoretical details behind. Instead, we will explain the methods by directly showing the code, output and plots. A bivariate Ornstein-Uhlenbeck process will be used as an example. We will then compare the posterior given by Metropolis-within-Gibbs MCMC and that produced by particle MCMC with the analytic result given by Kalman Filter. A benchmark test will also be carried out and we will discuss the pros and cons of particle MCMC at the end.

Multivariate Ornstein-Uhlenbeck Process

The $p$-dimensional multivariate Ornstein-Uhlenbeck (mOU) process $Y_t = (Y_{1t}, \ldots, Y_{dt})$ satisfies the SDE $$ dY_t = (\Gamma Y_t + \Lambda)dt + \Phi^{1/2} dB_t $$ where $B_t = (B_{1t}, \ldots, B_{pt})$ is $p$-dimensional Brownian motion. Its Euler discretization is of the form $$ Y_{n+1} = Y_n + (\Gamma Y_n + \Lambda) \Delta_n + \Phi^{1/2} \Delta B_n, $$ where $Y_n = Y(t_n)$, $\Delta_n = t_{n+1} - t_n$ and $$ \Delta B_n = B(t_{n+1}) - B(t_n) \overset{\textit{ind}}{\sim} \mathcal N(0, \Delta_n). $$ Thus, $Y_0, \ldots, Y_N$ is multivariate normal Markov chain for which the marginal distribution of any subset of timepoints and/or components can be efficiently calculated using the Kalman filter. This can be used to check the MCMC output of sde.post and particle MCMC output as in the following example.

MCMC for Bivariate OU Process

For reproducibility, we may first set up a seed.

set.seed(123)

Data simulation

We choose the pre-compiled bivariate OU model supplied by msde.

library(msde)
bmod <- sde.examples("biou")

Then we can generate some initial parameter values

# parameter values
Gamma0 <- .1 * crossprod(matrix(rnorm(4),2,2))
Lambda0 <- rnorm(2)
Phi0 <- crossprod(matrix(rnorm(4),2,2))
Psi0 <- chol(Phi0) # precompiled model uses the Cholesky scale
theta0 <- c(Gamma0, Lambda0, Psi0[c(1,3,4)])
names(theta0) <- bmod$param.names
# initial sde value
Y0 <- rnorm(2)
names(Y0) <- bmod$data.names

We set the total observation period to be 10 and simulate observations of SDE as follows:

# simulation
dT <- runif(1, max = .1) # time step
nObs <- 10
bsim <- sde.sim(bmod, x0 = Y0, theta = theta0,
                dt = dT, dt.sim = dT, nobs = nObs)
YObs <- bsim$data

For description of the options in sde.sim, please check the help manual of sde.sim.

Finally, we can initializa the bivariate OU model.

# initialization before MCMC
binit <- sde.init(bmod, x = YObs, dt = dT, theta = theta0,
                  nvar.obs = 1) # second component is unobserved

Posterior sampling via MCMC

We assume only $\Lambda_1$ is unknown (i.e., needs to be estimated). Thus, we introduce a vector fixed.params of logical TRUE and FALSE to indicate if a parameter is fixed or not. Since $\Lambda_1$ is unknown, it is, therefore, not fixed.

# only Lambda1 is unknown
fixed.params <- rep(TRUE, bmod$nparams)
names(fixed.params) <- bmod$param.names
fixed.params["Lambda1"] <- FALSE

Then we set the prior on the unknown parameter $\Lambda_1$ and the latent observation $Y_2$. The other elements are given a Lebesgue prior. $$ \Lambda_1, Y_2 \overset{iid}{\sim} \N(0,1), \quad \pi(Y_1, \Gamma, \Lambda_2, \Phi | \Lambda_1, Y_2) \propto 1. $$

# prior on (Lambda1, Y2)
hyper <- list(mu = c(0,0), Sigma = diag(2))
names(hyper$mu) <- c("Lambda1", "Y2")
dimnames(hyper$Sigma) <- rep(list(c("Lambda1", "Y2")), 2)

We set the total number of posterior samples as 100,000 with the burn-in period as 1,000. The posterior sampling via MCMC is given as follows:

# posterior sampling
nsamples <- 1e5
burn <- 1e3
bpost <- sde.post(bmod, binit, hyper = hyper,
                  fixed.params = fixed.params,
                  nsamples = nsamples, burn = burn)

Within sde.post, Metropolis-within-Gibbs (MWG) is used. The MWG jump size can be specified as a scalar, a vector or length nparams + ndims, or a named vector containing the elements defined by sde.init$nvar.obs.m[1] (the missing variables in the first SDE observation) and fixed.params (the SDE parameters which are not held fixed). For each MWG random variable, the default jump sizes are .25 * |initial_value| when |initial_value| > 0, and 1 otherwise. By default, sde.post carries out an adaptive MCMC proposal based on @adaptMCMC. At step n of the MCMC, the jump size of each MWG random variable is increased or decreased by $\delta(n)$, depending on whether the cumulative acceptance rate is above or below the optimal value of 0.44. If $\sigma_n$ is the size of the jump at step n, then the next jump size is determined by $$ \log(\sigma_{n+1}) = \log(\sigma_n) \pm \delta(n), \qquad \delta(n) = \min(.01, 1/n^{1/2}). $$ For further details of the implementation, please check the description of sde.post.

It is widely recognized that the optimal coordinate-wise acceptance rate should be around 44%. As we can see, our result 43.8% is very good in terms of the acceptance rate.

After the posterior sampling, we can extract the posterior samples corresponding to the unknown parameter $\Lambda_1$.

L1.mcmc <- bpost$params[,"Lambda1"]

Analytic sampling via Kalman filter

In order to check the sampling result given by MCMC, we can compare it with the analytic solution given by Kalman filter.

# analytic posterior
L1.seq <- seq(min(L1.mcmc), max(L1.mcmc), len = 500)
L1.loglik <- sapply(L1.seq, function(l1) {
  lambda <- Lambda0
  lambda[1] <- l1
  tmp <- mou.loglik(X = YObs, dt = dT, nvar.obs = 1,
                  Gamma = Gamma0, Lambda = lambda, Phi = Phi0,
                  mu0 = c(0, 0), Sigma0 = diag(2))
  tmp <- tmp + dnorm(l1, 0, 1, log = TRUE) # add the prior log density of Lambda1
  return(tmp)
})

# normalize density
L1.Kalman <- exp(L1.loglik - max(L1.loglik))
L1.Kalman <- L1.Kalman/sum(L1.Kalman)/(L1.seq[2]-L1.seq[1])

In the following subsection, we will then compare the histogram of posterior samples given by MCMC with the analytic curve given by Kalman filter.

Result comparison

As we can see from the following plot, the MCMC method is fine but definitely not great in this case.

# compare MCMC with Kalman filter
hist(L1.mcmc, breaks = 100, freq = FALSE,
     main = expression(p(Lambda[1]*" | "*bold(Y)[1])),
     xlab = expression(Lambda[1]))
lines(L1.seq, L1.Kalman, col = "red")
legend("topright", legend = c("Analytic", "MCMC"),
       pch = c(NA, 22), lty = c(1, NA), col = c("red", "black"))

Later we will show that even the basic particle MCMC can produce better result than the carefully tuned MWG MCMC.

Particle MCMC for Bivariate OU Process

Sequential Monte Carlo implementation

Particle MCMC (PMCMC) are a new class of MCMC techniques proposed in @pmcmc which rely on Sequential Monte Carlo (SMC) methods to build efficient high dimensional proposal distributions for MCMC samplers. Since SMC sampling plays a central role in the particle MCMC methodology, we abstract a group of template classes of SMC methods for SDEs in the header file sdeSMC.h. The header sdeSMC.h contains two main template classes: sdeParticle and sdeFilter. Class sdeParticle defines the state of particles of sdeModel class, comprising of the complete data at a given time point and providing storage for calculations. Class sdeFilter provides the filtering method based on sdeParticle, i.e. everything we need to calculate particle weights. In order to use these template classes in R to implement the SMC methods, we further build a header file sdePF.h in which the particle filtering method is a member function of sdeRobj class specified by sMode (SDE model type) and sPi (SDE prior type). The function header (with preprocessor directives) is given below ```{Rcpp sdePF, eval = FALSE}

include

include "sdeInterface.h"

include "sdeSMC.h"

typedef Rcpp::LogicalVector Logical; typedef Rcpp::NumericVector Numeric; typedef Rcpp::IntegerVector Integer; typedef Rcpp::NumericMatrix NumericMatrix; typedef Rcpp::List List;

template inline List sdeRobj::particleEval(Numeric initParams, NumericMatrix initData, Numeric dT, Integer nDimsPerObs, int nPart, int resample, double dThreshold, NumericMatrix NormalDraws, bool hasNormalDraws, bool historyOut)

For the details of the full function body, users are referred to the header file `sdePF.h`. All the SMC methods implementation in **`msde`** are specifically desinged for SDEs in mind and are based on the more general Sequential Monte Carlo Template Class (SMCTC) by @SMCTC and `RcppSMC` in **`R`** in which many low-level SMC mechanisms have been realized. Currently, `sdeSMC.h` is still under active development in aim to incorporate smoothing methods, parallel processing and the ability to deal with invalid data.

For a basic but general SMC algorithm exposition, the tutorial given by @tutorial is a good starting point. Note that particle filters for state-space models can be regarded as a special SMC algorithm. In the following subsection, we will discuss the idea and algorithm of particle MCMC for inference in state-space models.

### Particle MCMC algorithm
A general SSM with static parameters $\theta \in \Theta \subset \mathbb{R}^{n_\theta}$ (which may be multidimensional) consists of a hidden state process $\{X_n\}_{n \geq 1}$ and a observation process $\{Y_n\}_{n \geq 1}$. For inference for non-linear non-Gaussian SSMs, usual MCMC algorithms would implement some sort of Metropolis-within-Gibbs approach to sampling from the joint posterior $p(\theta, \bm{x} | \bm{y})$. However, due to the high correlation between $\theta$ and $\bm{X}$ (given $\bm{Y}$), such algorithms are very inefficient. In contrast, MCMC directly on $p(\theta | \bm{y})$ would be much more efficient, but it's impossible because the marginal isn't available in closed form for general SDEs. Particle MCMC overcomes this challenge by using an unbiased estimate, on an extended space including all the variables $(\theta, \bm{X}, \bm{Y})$, obtained by SMC. The point of particle MCMC is to be able to do MCMC on $p(\theta | \bm{y})$ directly (in terms of MCMC efficiency).

Currently, `sde.pmcmc`^[Since `sde.pmcmc` is still under development, it is currently not available for users of **`msde`**.] in **`msde`** only implements the *particle marginal Metropolis-Hastings* (PMMH) sampler in the particle MCMC toolbox which will be used for our later simulation study. PMMH is an approximation of an ideal *pseudo marginal Metropolis-Hastings* sampler (see @pseudoMCMC) for sampling from $p(\theta, x_{1:T} | y_{1:T})$. It utilizes the following proposal density
$$
q((\theta^*, x^*_{1:T}) | (\theta, x_{1:T})) := q(\theta^* | \theta) p_{\theta^*}(x^*_{1:T} | y_{1:T})
$$
where the proposed $x^*_{1:T}$ is perfectly *adapted* to the proposed $\theta^*$. The proposal density $q(\theta^* | \theta)$ should be specified by users. In `sde.pmcmc`, a very basic IID-Metropolis proposal is implemented. Samples from $p_{\theta^*}(x^*_{1:T} | y_{1:T})$ can be obtained from an SMC algorithm targeting $p_{\theta^*}(x^*_{1:T} | y_{1:T})$. In fact we do not need the analytic expression of $p_{\theta^*}(x^*_{1:T} | y_{1:T})$ to calculate the acceptance ratio since the ratio can be transformed as
$$
\frac{p(\theta^*, x^*_{1:T} | y_{1:T})}{p(\theta, x_{1:T} | y_{1:T})} \frac{q((\theta, x_{1:T}) | (\theta^*, x^*_{1:T}))}{q((\theta^*, x^*_{1:T}) | (\theta, x_{1:T}))} = \frac{p_{\theta^*}(y_{1:T}) p(\theta^*)}{p_\theta(y_{1:T})p(\theta)} \frac{q(\theta | \theta^*)}{q(\theta^* | \theta)}
$$
In the above formula, $p_\theta(y_{1:T})$ can be replaced by its estimated expression $\hat{p}_\theta(y_{1:T})$
$$
\hat{p}_\theta(y_{1:T}) := \hat{p}_\theta(y_1) \prod_{n=2}^T \hat{p}_\theta(y_n | y_{1:n-1})
$$
which are approximated by the unnormalized particle weights (not the incremental weights) at time $T$, i.e.
$$
\hat{p}_\theta(y_{1:T}) = \frac{1}{N} \sum_{i=1}^N w(X_{1:T}^i)
$$
where $N$ is the pre-specified total number of particles.

In practice, we store the unnormalized weights at time point $n$ as $\log$-weights, i.e. $\log w_n(X_{1:n}^i)$. To avoid possible overflow problem, we had better calculate $\log \hat{p}_\theta(y_{1:n})$ by using the following formula
$$
\log \hat{p}_\theta(y_{1:n}) = \log\left( \frac{1}{N} \sum_{i=1}^N \exp\left[\log w_n(X_{1:n}^i) - C_n\right] \right) + C_n
$$
where $C_n = \max_{1\leq i \leq N}\{\log w_n(X_{1:n}^i)\}$.

### Posterior sampling via particle MCMC
Let's set the number of particles as `100` and just use an adaptive random walk proposal for updating the unknown parameter $\Lambda_1$. Here our aim is to show that even "off-the-shelf" choice can give satisfactory result.
```r
source(system.file("proj", "sde.pmcmc.R", package = "msde"))

npart <- 100
rw.sd <- 1
delta <- .7
ppost <- sde.pmcmc(model = bmod, binit, theta0, fixed.params, hyper,
                   nsamples, npart, dT,
                   resample = "multi", threshold = 0.5, rw.sd, delta)

In general we have the following proposal: $$ \theta_{j, prop} \sim \N(\theta_{j, curr}, \sigma^2_{n,j}) $$ where $\theta_{j,prop}$ is the proposed (new) $j$-th parameter and $\theta_{j,curr}$ is the current (or old) $j$-th parameter. Besides, the adaptive standard deviation $\sigma_{n+1,j}$ at the $n+1$-th iteration for the $j$-th parameter is determined by $$ \sigma_{n+1,j} = \exp\left(\log(\sigma_{n,j}) \pm \frac{\delta}{n} \right). $$ Based on the above adaptive rule, we increase/decrease $\sigma_{n,j}$ at each step if the previous draw was accepted/rejected. The amount of adaptation will goes to $0$ as the number $n$ of iterations goes large. This kind of adaptation can preserve the MCMC stationary distribution.

Then we can check the acceptance rate.

# check the acceptance rate
accept <- ppost$accept
print(accept)

The acceptance rate is 50.29% which is close to the optimal 45%.

Finally, we extract the posterior samples of $\Lambda_1$ which will be used for comparison in the following subsection.

# posterior samples
L1.pmcmc <- ppost$params[ ,!fixed.params]

Result comparison

Similarly, we compare the histogram of posterior samples of $\Lambda_1$ given by particle MCMC with the analytic curve solved by Kalman filter.

# compare particle MCMC with Kalman filter
hist(L1.pmcmc, breaks = 100, freq = FALSE,
     main = expression(p(Lambda[1]*" | "*bold(Y)[1])),
     xlab = expression(Lambda[1]))
lines(L1.seq, L1.Kalman, col = "red")
legend("topright", legend = c("Analytic", "PMCMC"),
       pch = c(NA, 22), lty = c(1, NA), col = c("red", "black"))

It is clear that the result given by particle MCMC can fit the analytic curve better than the MWG MCMC. However, such gain in performance comes at a price, as we will discuss in the next section.

Benchmark Study

Numerical test results

The better performance of particle MCMC sacrifices the computational speed. We can do a benchmark test to show the computational complexity of particle MCMC relative to the usual MCMC.

require(microbenchmark)
mbm <- microbenchmark(
  "MCMC" = {
    bpost <- sde.post(bmod, binit, hyper = hyper,
                      fixed.params = fixed.params,
                      nsamples = nsamples, burn = burn, verbose = FALSE)
  },
  "PMCMC" = {
    ppost <- sde.pmcmc(model = bmod, binit, theta0, fixed.params, nsamples, npart, dT,
                       resample = "multi", threshold = 0.5, mwg.sd)
  },
  times = 50 # number of times to evaluate the expression
)

The benchmark results are printed as follows:

print(mbm)

The time unit is "millisecond". The total number of evaluation neval is set to be 50. The column min, max, mean, median represent the minimum, maximum, mean and median time spent, resprectively. Besides, lq and uq are, respectively, lower quantile and upper quantile of the sample. If the multcomp package is available, then a statistical ranking is calculated and displayed in compact letter display from in the cld column

The mean amount of time spent to run the adaptive MWG MCMC is just about 0.3 seconds. Whereas, the mean time needed to run the particle MCMC is around 50 seconds. Even though the particle MCMC shows better performance in our example above, the computational cost is much larger than the usual MCMC.

Test results visualization

We can also visualize the benchmark results.

require(ggplot2)
autoplot(mbm)

The plot is called violin plot which is a method of plotting numeric data. It is similar to box plot but with a rotated kernel density plot on each side. The visualization also confirms that particle MCMC is computationally more intensive than MCMC.

References

mlysy/msde documentation built on May 28, 2022, 5:18 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlysy/msde
Bayesian Inference for Multivariate Stochastic Differential Equations

In mlysy/msde: Bayesian Inference for Multivariate Stochastic Differential Equations

Introduction

Multivariate Ornstein-Uhlenbeck Process

MCMC for Bivariate OU Process

Data simulation

Posterior sampling via MCMC

Analytic sampling via Kalman filter

Result comparison

Particle MCMC for Bivariate OU Process

Sequential Monte Carlo implementation

include

include

include "sdeInterface.h"

include "sdeSMC.h"

Result comparison

Benchmark Study

Numerical test results

Test results visualization

References

R Package Documentation

Browse R Packages

We want your feedback!

mlysy/msde Bayesian Inference for Multivariate Stochastic Differential Equations

In mlysy/msde: Bayesian Inference for Multivariate Stochastic Differential Equations

Introduction

Multivariate Ornstein-Uhlenbeck Process

MCMC for Bivariate OU Process

Data simulation

Posterior sampling via MCMC

Analytic sampling via Kalman filter

Result comparison

Particle MCMC for Bivariate OU Process

Sequential Monte Carlo implementation

include

include

include "sdeInterface.h"

include "sdeSMC.h"

Result comparison

Benchmark Study

Numerical test results

Test results visualization

References

R Package Documentation

Browse R Packages

We want your feedback!

mlysy/msde
Bayesian Inference for Multivariate Stochastic Differential Equations