\newcommand{\Ep}{\mathrm{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\idm}{\mathrm{I}} \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\N}{\mathcal{N}}
This vignette shows how to realize particle MCMC for multivariate SDEs with msde
.
Parameter estimation of multivariate stochastic differential equations is a very meaningful but usually difficult task. If we translate a given stochastic differential equation into a non-linear state-space model (SSM), then Particle Markov chain Monte Carlo methods are the state-of-the-art technique that can be applied to make Bayesian inference feasible in such a situation.
For tutorial purposes, we will not delve into the theoretical details behind. Instead, we will explain the methods by directly showing the code, output and plots. A bivariate Ornstein-Uhlenbeck process will be used as an example. We will then compare the posterior given by Metropolis-within-Gibbs MCMC and that produced by particle MCMC with the analytic result given by Kalman Filter. A benchmark test will also be carried out and we will discuss the pros and cons of particle MCMC at the end.
The $p$-dimensional multivariate Ornstein-Uhlenbeck (mOU) process $Y_t = (Y_{1t}, \ldots, Y_{dt})$ satisfies the SDE
$$
dY_t = (\Gamma Y_t + \Lambda)dt + \Phi^{1/2} dB_t
$$
where $B_t = (B_{1t}, \ldots, B_{pt})$ is $p$-dimensional Brownian motion. Its Euler discretization is of the form
$$
Y_{n+1} = Y_n + (\Gamma Y_n + \Lambda) \Delta_n + \Phi^{1/2} \Delta B_n,
$$
where $Y_n = Y(t_n)$, $\Delta_n = t_{n+1} - t_n$ and
$$
\Delta B_n = B(t_{n+1}) - B(t_n) \overset{\textit{ind}}{\sim} \mathcal N(0, \Delta_n).
$$
Thus, $Y_0, \ldots, Y_N$ is multivariate normal Markov chain for which the marginal distribution of any subset of timepoints and/or components can be efficiently calculated using the Kalman filter. This can be used to check the MCMC output of sde.post
and particle MCMC output as in the following example.
For reproducibility, we may first set up a seed.
set.seed(123)
We choose the pre-compiled bivariate OU model supplied by msde
.
library(msde) bmod <- sde.examples("biou")
Then we can generate some initial parameter values
# parameter values Gamma0 <- .1 * crossprod(matrix(rnorm(4),2,2)) Lambda0 <- rnorm(2) Phi0 <- crossprod(matrix(rnorm(4),2,2)) Psi0 <- chol(Phi0) # precompiled model uses the Cholesky scale theta0 <- c(Gamma0, Lambda0, Psi0[c(1,3,4)]) names(theta0) <- bmod$param.names # initial sde value Y0 <- rnorm(2) names(Y0) <- bmod$data.names
We set the total observation period to be 10
and simulate observations of SDE as follows:
# simulation dT <- runif(1, max = .1) # time step nObs <- 10 bsim <- sde.sim(bmod, x0 = Y0, theta = theta0, dt = dT, dt.sim = dT, nobs = nObs) YObs <- bsim$data
For description of the options in sde.sim
, please check the help manual of sde.sim
.
Finally, we can initializa the bivariate OU model.
# initialization before MCMC binit <- sde.init(bmod, x = YObs, dt = dT, theta = theta0, nvar.obs = 1) # second component is unobserved
We assume only $\Lambda_1$ is unknown (i.e., needs to be estimated). Thus, we introduce a vector fixed.params
of logical TRUE
and FALSE
to indicate if a parameter is fixed or not. Since $\Lambda_1$ is unknown, it is, therefore, not fixed.
# only Lambda1 is unknown fixed.params <- rep(TRUE, bmod$nparams) names(fixed.params) <- bmod$param.names fixed.params["Lambda1"] <- FALSE
Then we set the prior on the unknown parameter $\Lambda_1$ and the latent observation $Y_2$. The other elements are given a Lebesgue prior. $$ \Lambda_1, Y_2 \overset{iid}{\sim} \N(0,1), \quad \pi(Y_1, \Gamma, \Lambda_2, \Phi | \Lambda_1, Y_2) \propto 1. $$
# prior on (Lambda1, Y2) hyper <- list(mu = c(0,0), Sigma = diag(2)) names(hyper$mu) <- c("Lambda1", "Y2") dimnames(hyper$Sigma) <- rep(list(c("Lambda1", "Y2")), 2)
We set the total number of posterior samples as 100,000
with the burn-in period as 1,000
. The posterior sampling via MCMC is given as follows:
# posterior sampling nsamples <- 1e5 burn <- 1e3 bpost <- sde.post(bmod, binit, hyper = hyper, fixed.params = fixed.params, nsamples = nsamples, burn = burn)
Within sde.post
, Metropolis-within-Gibbs (MWG) is used. The MWG jump size can be specified as a scalar, a vector or length nparams + ndims
, or a named vector containing the elements defined by sde.init$nvar.obs.m[1]
(the missing variables in the first SDE observation) and fixed.params
(the SDE parameters which are not held fixed). For each MWG random variable, the default jump sizes are .25 * |initial_value|
when |initial_value| > 0
, and 1
otherwise. By default, sde.post
carries out an adaptive MCMC proposal based on @adaptMCMC. At step n of the MCMC, the jump size of each MWG random variable is increased or decreased by $\delta(n)$, depending on whether the cumulative acceptance rate is above or below the optimal value of 0.44
. If $\sigma_n$ is the size of the jump at step n
, then the next jump size is determined by
$$
\log(\sigma_{n+1}) = \log(\sigma_n) \pm \delta(n), \qquad \delta(n) = \min(.01, 1/n^{1/2}).
$$
For further details of the implementation, please check the description of sde.post
.
It is widely recognized that the optimal coordinate-wise acceptance rate should be around 44%. As we can see, our result 43.8% is very good in terms of the acceptance rate.
After the posterior sampling, we can extract the posterior samples corresponding to the unknown parameter $\Lambda_1$.
L1.mcmc <- bpost$params[,"Lambda1"]
In order to check the sampling result given by MCMC, we can compare it with the analytic solution given by Kalman filter.
# analytic posterior L1.seq <- seq(min(L1.mcmc), max(L1.mcmc), len = 500) L1.loglik <- sapply(L1.seq, function(l1) { lambda <- Lambda0 lambda[1] <- l1 tmp <- mou.loglik(X = YObs, dt = dT, nvar.obs = 1, Gamma = Gamma0, Lambda = lambda, Phi = Phi0, mu0 = c(0, 0), Sigma0 = diag(2)) tmp <- tmp + dnorm(l1, 0, 1, log = TRUE) # add the prior log density of Lambda1 return(tmp) }) # normalize density L1.Kalman <- exp(L1.loglik - max(L1.loglik)) L1.Kalman <- L1.Kalman/sum(L1.Kalman)/(L1.seq[2]-L1.seq[1])
In the following subsection, we will then compare the histogram of posterior samples given by MCMC with the analytic curve given by Kalman filter.
As we can see from the following plot, the MCMC method is fine but definitely not great in this case.
# compare MCMC with Kalman filter hist(L1.mcmc, breaks = 100, freq = FALSE, main = expression(p(Lambda[1]*" | "*bold(Y)[1])), xlab = expression(Lambda[1])) lines(L1.seq, L1.Kalman, col = "red") legend("topright", legend = c("Analytic", "MCMC"), pch = c(NA, 22), lty = c(1, NA), col = c("red", "black"))
Later we will show that even the basic particle MCMC can produce better result than the carefully tuned MWG MCMC.
Particle MCMC (PMCMC) are a new class of MCMC techniques proposed in @pmcmc which rely on Sequential Monte Carlo (SMC) methods to build efficient high dimensional proposal distributions for MCMC samplers. Since SMC sampling plays a central role in the particle MCMC methodology, we abstract a group of template classes of SMC methods for SDEs in the header file sdeSMC.h
. The header sdeSMC.h
contains two main template classes: sdeParticle
and sdeFilter
. Class sdeParticle
defines the state of particles of sdeModel
class, comprising of the complete data at a given time point and providing storage for calculations. Class sdeFilter
provides the filtering method based on sdeParticle
, i.e. everything we need to calculate particle weights. In order to use these template classes in R
to implement the SMC methods, we further build a header file sdePF.h
in which the particle filtering method is a member function of sdeRobj
class specified by sMode
(SDE model type) and sPi
(SDE prior type). The function header (with preprocessor directives) is given below
```{Rcpp sdePF, eval = FALSE}
typedef Rcpp::LogicalVector Logical; typedef Rcpp::NumericVector Numeric; typedef Rcpp::IntegerVector Integer; typedef Rcpp::NumericMatrix NumericMatrix; typedef Rcpp::List List;
template
For the details of the full function body, users are referred to the header file `sdePF.h`. All the SMC methods implementation in **`msde`** are specifically desinged for SDEs in mind and are based on the more general Sequential Monte Carlo Template Class (SMCTC) by @SMCTC and `RcppSMC` in **`R`** in which many low-level SMC mechanisms have been realized. Currently, `sdeSMC.h` is still under active development in aim to incorporate smoothing methods, parallel processing and the ability to deal with invalid data. For a basic but general SMC algorithm exposition, the tutorial given by @tutorial is a good starting point. Note that particle filters for state-space models can be regarded as a special SMC algorithm. In the following subsection, we will discuss the idea and algorithm of particle MCMC for inference in state-space models. ### Particle MCMC algorithm A general SSM with static parameters $\theta \in \Theta \subset \mathbb{R}^{n_\theta}$ (which may be multidimensional) consists of a hidden state process $\{X_n\}_{n \geq 1}$ and a observation process $\{Y_n\}_{n \geq 1}$. For inference for non-linear non-Gaussian SSMs, usual MCMC algorithms would implement some sort of Metropolis-within-Gibbs approach to sampling from the joint posterior $p(\theta, \bm{x} | \bm{y})$. However, due to the high correlation between $\theta$ and $\bm{X}$ (given $\bm{Y}$), such algorithms are very inefficient. In contrast, MCMC directly on $p(\theta | \bm{y})$ would be much more efficient, but it's impossible because the marginal isn't available in closed form for general SDEs. Particle MCMC overcomes this challenge by using an unbiased estimate, on an extended space including all the variables $(\theta, \bm{X}, \bm{Y})$, obtained by SMC. The point of particle MCMC is to be able to do MCMC on $p(\theta | \bm{y})$ directly (in terms of MCMC efficiency). Currently, `sde.pmcmc`^[Since `sde.pmcmc` is still under development, it is currently not available for users of **`msde`**.] in **`msde`** only implements the *particle marginal Metropolis-Hastings* (PMMH) sampler in the particle MCMC toolbox which will be used for our later simulation study. PMMH is an approximation of an ideal *pseudo marginal Metropolis-Hastings* sampler (see @pseudoMCMC) for sampling from $p(\theta, x_{1:T} | y_{1:T})$. It utilizes the following proposal density $$ q((\theta^*, x^*_{1:T}) | (\theta, x_{1:T})) := q(\theta^* | \theta) p_{\theta^*}(x^*_{1:T} | y_{1:T}) $$ where the proposed $x^*_{1:T}$ is perfectly *adapted* to the proposed $\theta^*$. The proposal density $q(\theta^* | \theta)$ should be specified by users. In `sde.pmcmc`, a very basic IID-Metropolis proposal is implemented. Samples from $p_{\theta^*}(x^*_{1:T} | y_{1:T})$ can be obtained from an SMC algorithm targeting $p_{\theta^*}(x^*_{1:T} | y_{1:T})$. In fact we do not need the analytic expression of $p_{\theta^*}(x^*_{1:T} | y_{1:T})$ to calculate the acceptance ratio since the ratio can be transformed as $$ \frac{p(\theta^*, x^*_{1:T} | y_{1:T})}{p(\theta, x_{1:T} | y_{1:T})} \frac{q((\theta, x_{1:T}) | (\theta^*, x^*_{1:T}))}{q((\theta^*, x^*_{1:T}) | (\theta, x_{1:T}))} = \frac{p_{\theta^*}(y_{1:T}) p(\theta^*)}{p_\theta(y_{1:T})p(\theta)} \frac{q(\theta | \theta^*)}{q(\theta^* | \theta)} $$ In the above formula, $p_\theta(y_{1:T})$ can be replaced by its estimated expression $\hat{p}_\theta(y_{1:T})$ $$ \hat{p}_\theta(y_{1:T}) := \hat{p}_\theta(y_1) \prod_{n=2}^T \hat{p}_\theta(y_n | y_{1:n-1}) $$ which are approximated by the unnormalized particle weights (not the incremental weights) at time $T$, i.e. $$ \hat{p}_\theta(y_{1:T}) = \frac{1}{N} \sum_{i=1}^N w(X_{1:T}^i) $$ where $N$ is the pre-specified total number of particles. In practice, we store the unnormalized weights at time point $n$ as $\log$-weights, i.e. $\log w_n(X_{1:n}^i)$. To avoid possible overflow problem, we had better calculate $\log \hat{p}_\theta(y_{1:n})$ by using the following formula $$ \log \hat{p}_\theta(y_{1:n}) = \log\left( \frac{1}{N} \sum_{i=1}^N \exp\left[\log w_n(X_{1:n}^i) - C_n\right] \right) + C_n $$ where $C_n = \max_{1\leq i \leq N}\{\log w_n(X_{1:n}^i)\}$. ### Posterior sampling via particle MCMC Let's set the number of particles as `100` and just use an adaptive random walk proposal for updating the unknown parameter $\Lambda_1$. Here our aim is to show that even "off-the-shelf" choice can give satisfactory result. ```r source(system.file("proj", "sde.pmcmc.R", package = "msde"))
npart <- 100 rw.sd <- 1 delta <- .7 ppost <- sde.pmcmc(model = bmod, binit, theta0, fixed.params, hyper, nsamples, npart, dT, resample = "multi", threshold = 0.5, rw.sd, delta)
In general we have the following proposal: $$ \theta_{j, prop} \sim \N(\theta_{j, curr}, \sigma^2_{n,j}) $$ where $\theta_{j,prop}$ is the proposed (new) $j$-th parameter and $\theta_{j,curr}$ is the current (or old) $j$-th parameter. Besides, the adaptive standard deviation $\sigma_{n+1,j}$ at the $n+1$-th iteration for the $j$-th parameter is determined by $$ \sigma_{n+1,j} = \exp\left(\log(\sigma_{n,j}) \pm \frac{\delta}{n} \right). $$ Based on the above adaptive rule, we increase/decrease $\sigma_{n,j}$ at each step if the previous draw was accepted/rejected. The amount of adaptation will goes to $0$ as the number $n$ of iterations goes large. This kind of adaptation can preserve the MCMC stationary distribution.
Then we can check the acceptance rate.
# check the acceptance rate accept <- ppost$accept print(accept)
The acceptance rate is 50.29% which is close to the optimal 45%.
Finally, we extract the posterior samples of $\Lambda_1$ which will be used for comparison in the following subsection.
# posterior samples L1.pmcmc <- ppost$params[ ,!fixed.params]
Similarly, we compare the histogram of posterior samples of $\Lambda_1$ given by particle MCMC with the analytic curve solved by Kalman filter.
# compare particle MCMC with Kalman filter hist(L1.pmcmc, breaks = 100, freq = FALSE, main = expression(p(Lambda[1]*" | "*bold(Y)[1])), xlab = expression(Lambda[1])) lines(L1.seq, L1.Kalman, col = "red") legend("topright", legend = c("Analytic", "PMCMC"), pch = c(NA, 22), lty = c(1, NA), col = c("red", "black"))
It is clear that the result given by particle MCMC can fit the analytic curve better than the MWG MCMC. However, such gain in performance comes at a price, as we will discuss in the next section.
The better performance of particle MCMC sacrifices the computational speed. We can do a benchmark test to show the computational complexity of particle MCMC relative to the usual MCMC.
require(microbenchmark) mbm <- microbenchmark( "MCMC" = { bpost <- sde.post(bmod, binit, hyper = hyper, fixed.params = fixed.params, nsamples = nsamples, burn = burn, verbose = FALSE) }, "PMCMC" = { ppost <- sde.pmcmc(model = bmod, binit, theta0, fixed.params, nsamples, npart, dT, resample = "multi", threshold = 0.5, mwg.sd) }, times = 50 # number of times to evaluate the expression )
The benchmark results are printed as follows:
print(mbm)
The time unit is "millisecond". The total number of evaluation neval
is set to be 50
. The column min
, max
, mean
, median
represent the minimum, maximum, mean and median time spent, resprectively. Besides, lq
and uq
are, respectively, lower quantile and upper quantile of the sample. If the multcomp
package is available, then a statistical ranking is calculated and displayed in compact letter display from in the cld
column
The mean amount of time spent to run the adaptive MWG MCMC is just about 0.3 seconds. Whereas, the mean time needed to run the particle MCMC is around 50 seconds. Even though the particle MCMC shows better performance in our example above, the computational cost is much larger than the usual MCMC.
We can also visualize the benchmark results.
require(ggplot2) autoplot(mbm)
The plot is called violin plot which is a method of plotting numeric data. It is similar to box plot but with a rotated kernel density plot on each side. The visualization also confirms that particle MCMC is computationally more intensive than MCMC.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.