set.seed(10)
The package deGradInfer
is an R package for efficient parameter inference in systems of ordinary differential equations (ODEs), using adaptive gradient matching via Gaussian processes. This implementation follows from the work in @calderhead2009; @dondelinger2013 and @macdonald2017, although the software has been extended with several crucial features to make it robust and useful for practical applications. Features include:
In the next section, we will briefly sketch the underlying statistical model and inference, before moving on to the practical examples of how to use the package.
The package implements the adaptive gradient matching approach of @dondelinger2013, with the option of using parallel tempering to fix the mismatch parameter in place, as described in @macdonald2017. The advantage of gradient matching as a parameter inference technique is that it does not require us to numerically solve the ODE system at each step, a potentially costly operation for large systems.
Briefly, assuming Gaussian noise on the observed variable $\mathbf{y}_k$, $P(y_k(t)) = \mathcal{N}(y_k(t)|x_k(t),\sigma_k)$, we can model the latent variables $\mathbf{x}_k$ using a Gaussian process prior:
$$P(\mathbf{x}k|\Psi_k) = \mathcal{N}(\mathbf{x}_k|\mathbf{0}, \mathbf{C}{\Psi_k}),$$ where $\mathbf{C}_{\Psi_k}$ denotes a positive definite matrix of covariance functions with hyperparameters $\Psi_k$. We can use the properties of Gaussian processes to obtain a model (in closed form) for the gradient $\mathbf{\dot{x}}_k$, which we combine with the ODE system model via a product of experts:
$$P(\mathbf{\dot{x}}_k|\mathbf{X},\Theta,\Psi_k,\gamma_k) \propto P(\mathbf{\dot{x}}_k|\mathbf{x}_k,\Psi_k)P(\mathbf{\dot{x}}_k|\mathbf{X},\Theta,\gamma_k),$$ where $\mathbf{X}$ is a $T \times K$ matrix where column $k$ corresponds to $\mathbf{x}_k$. Note that this approach essentially provides two models for $\mathbf{\dot{x}}_k$; a Gaussian process model which is based on a smoothing of the observed variables, and the ODE model, with parameters $\Theta$. The parameters $\gamma_k$ allow for some Gaussian noise on the ODE estimates of the gradients, and thus effectively control the allowable mismatch between the ODE and Gaussian process estimates.
Rather than inferring the gradient mismatch parameters, we can instead use parallel tempering to set them. If the ODE model properly (or adequately) describes the process then ideally we would want no (or little) mismatch between the predctions from the ODEs and the interpolant, when sampling from the posterior. However, bringing in this knowledge by forcing there to be no mismatch between the gradients at the beginning of the MCMC run can cause the algorithm to converge to suboptimal states. Instead, by running a number of parallel chains, setting large values of the gradient mismatch parameter for chains closer to the prior and small values for chains closer to the posterior, this prior knowledge can be brought in whilst still allowing the algorithm enough flexibility to explore. See @macdonald2017 for details.
We integrate out the latent variables $\mathbf{\dot{X}}$ and do inference over the joint model $P(\mathbf{Y}, \mathbf{X}, \mathbf{\Theta}, \mathbf{\Psi}, \mathbf{\sigma})$ (see @dondelinger2013 for details). Inference is done using population MCMC, which can deal with multimodal posterior landscapes, and will give a posterior distribution for the parameters, in addition to the posterior mean estimates.
The package supports priors defined by the user. The user should write their function to return a vector of log densities, for a given parameter set. For example, a Uniform prior (lowerbound=0, upperbound=10) on parameter 1, a Gamma prior (shape=4, scale=0.5) on parameters 2 and 3 and a Chi-squared prior (degrees of freedom=2) on parameter 4 would take the form
testLogPrior <- function(params) { return(c(dunif(params[1],0,10,log=TRUE), dgamma(params[2:3],shape=4,scale=0.5,log=TRUE), dchisq(params[4],2,log=TRUE))) }
We will use the example of a simple Lotka-Volterra prey-predator system. This consists of two variables representing a predator species and a prey species. The system is governed by four parameters and takes the following form:
$$ \frac{dx_1}{dt} = x_1 (a - b x_2) \hspace{1cm}\mathrm{Prey} \ \frac{dx_2}{dt} = -x_2 (c x_2 - d x_1) \hspace{1cm}\mathrm{Predators} $$
where $x_1$ represents the number of prey and $x_2$ represents the number of predators.
In order to use deGradInfer
, we need to represent this system as an R function. Our implementation is inspired by the package deSolve
for numerically solving ODE systems represented as R functions. In particular, we require the function to be of the form func <- function(t,X,params,...)
where t
is a $T$-dimensional vector of time points, X
is a $T \times K$ matrix containing the values for the $K$ variables in the system, params
is a $p$-dimensional vector of parameters, and ...
denotes optional parameters that may be passed to the function. The output should be a $T \times K$ matrix of the gradients $\frac{dx_k}{dt}$ for all variables $k$. For the Lotka-Volterra system, we can define an R function as:
LV_func <- function(t, X, params) { dxdt = cbind( X[,1]*(params[1] - params[2]*X[,2]), # Prey - X[,2]*(params[3] - params[4]*X[,1]) # Predators ) return(dxdt) }
Note that if one wishes to use their function with deSolve
, it needs to be compatible with scalar t
, and additionally needs to return a list (this is a requirement by deSolve
to allow for returning global values that are needed at each time point; we do not currently return any such global values). This is best achieved by writing a wrapper for the function:
library('deSolve') # Wrapper function deSolve_LV_func = function(t,y,params) { list(LV_func(t,matrix(y,1,length(y)),params)) } # Generate some data test.times = seq(0,2,0.1) test.data = ode(c(5,3), test.times, deSolve_LV_func, c(2,1,4,1)) test.data = test.data[,2:3] + rnorm(dim(test.data)[1]*2,0,0.1) # add some observational noise
The main interface to the package is provided via the agm
function. The user needs to provide the data for the observed time points, the R function representing the ODE system, the number of parameters in the system, and the expected standard deviation of the observational noise. Note that we do not currently allow the noise to be inferred for the gradient matching approach, as this tends to overestimate the true noise in the data.
Here we only run the model for 1000 iterations, which is not sufficient to reach convergence, but is good enough for demonstration purposes:
library('deGradInfer') agm.result = agm(test.data,test.times,LV_func,4,noise.sd=0.1, maxIterations=1000,chainNum=5) print(agm.result$posterior.mean)
We can plot the posterior distribution of the inferred parameter values using the samples from the posterior.
library('ggplot2') plotting.frame = data.frame(Param.Values=c(agm.result$posterior.samples), Param.Num=rep(1:4, each=dim(agm.result$posterior.samples)[1])) ggplot(plotting.frame, aes(x=Param.Values, y=..scaled..)) + facet_wrap(~paste('Parameter', Param.Num), scales = 'free') + geom_density() + labs(y='Scaled Density', x='Parameter Value') + xlim(c(0,5))
We can also inspect the evolution of the log likelihood:
qplot(x=seq(25,1000,25), y=c(agm.result$ll), geom = 'line') + labs(x='Iterations', y='Log-likelihood')
The mismatch between the Gaussian process model and the ODE model of the system is captured by the parameter $\gamma$. @macdonald2017 introduced a tempering scheme to replace naive inference of this parameter. This scheme can be used in our model by setting the temperMismatchParameter
option.
agm.result = agm(test.data,test.times,LV_func,4,noise.sd=0.1,temperMismatchParameter=TRUE, originalSignalOnlyPositive=TRUE, logPrior="Gamma", defaultTemperingScheme="LB10")
Users can specify their own values for the tempering ladder by ensuring temperMismatchParameter=TRUE
, defaultTemperingScheme=FALSE
and then passing their matrix of values to mismatchParameterValues
. The number of rows of the matrix should be equal to the number of parallel chains and the number of columns should be equal to the number of variables in the system. A typical ladder should have the largest value in the first row and the smallest value in the last row.
agm.result = agm(test.data,test.times,LV_func,4,noise.sd=0.1,temperMismatchParameter=TRUE, originalSignalOnlyPositive=TRUE,logPrior="Gamma", mismatchParameterValues=matrix(seq(0.01,1,length.out=20),nrow=20,ncol=2))
For comparison purposes, we also implemented a version of the MCMC algorithm that explicitly solves the ODE system at each step (with deSolve
), rather than using gradient matching. This can be invoked by using the explicit
flag.
agm.result = agm(test.data,test.times,deSolve_LV_func,numberOfParameters=6,explicit=TRUE)
Note that the number of parameters has increased; this is because the initial states of the ODE at the first time point also need to be inferred when using this method. So in the Lotka-Volterra case, the number of parameters increases by 2 because there are 2 variables in the ODE system.
The package can in principle deal with ODE systems of any size, although in practice the achievable size will depend on memory and running time considerations. Here we show how to apply it to an ODE system with five equations and six parameters, derived from @vyshemirsky2007.
# Function encoding the Vyshemirsky and Girolami ODE model 1. VG_func <- function(t, X, params) { MM <- ((params[5]*X[,5])/(params[6]+X[,5])) # Michaelis-Menten term dxdt <- cbind( - params[1]*X[,1] - params[2]*X[,1]*X[,3] + params[3]*X[,4], # S params[1]*X[,1], # dS - params[2]*X[,1]*X[,3] + params[3]*X[,4] + MM, # R params[2]*X[,1]*X[,3] - params[3]*X[,4] - params[4]*X[,4], # RS params[4]*X[,4] - MM # Rpp ) return(dxdt) } # Generate data timeTest = c(0,1,2,4,5,7,10,15,20,30,40,50,60,80,100) dataTest = ode(c(1,0,1,0,0), timeTest, function(t,y,params) list(VG_func(t,matrix(y,1,length(y)),params)), c(0.07,0.6,0.05,0.3,0.017,0.3)) dataTest = dataTest[,2:6] + rnorm(dim(dataTest)[1]*5,0,0.01) # Run adaptive gradient matching agm.result = agm(data=dataTest,time=timeTest,ode.system=VG_func, numberOfParameters=6)
If not all variables are observed, we can infer the unobserved variables by using a latent Gaussian process. Note that the data
matrix still needs to have a column for each variable, but unobserved variables can be set to NA
, and the observed variables need to be specified in observedVariables
. Partially observed variables are not currently supported.
# Remove the observations of the first variable dataTest[,1] = NA # Run adaptive gradient matching agm.result = agm(data=dataTest,time=timeTest,ode.system=VG_func, observedVariables=2:5, numberOfParameters=6)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.