modMCMC: Constrained Markov Chain Monte Carlo In FME: A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis

Description

Performs a Markov Chain Monte Carlo simulation, using an adaptive Metropolis (AM) algorithm and including a delayed rejection (DR) procedure.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22``` ```modMCMC(f, p, ..., jump = NULL, lower = -Inf, upper = +Inf, prior = NULL, var0 = NULL, wvar0 = NULL, n0 = NULL, niter = 1000, outputlength = niter, burninlength = 0, updatecov = niter, covscale = 2.4^2/length(p), ntrydr = 1, drscale = NULL, verbose = TRUE) ## S3 method for class 'modMCMC' summary(object, remove = NULL, ...) ## S3 method for class 'modMCMC' pairs(x, Full = FALSE, which = 1:ncol(x\$pars), remove = NULL, nsample = NULL, ...) ## S3 method for class 'modMCMC' hist(x, Full = FALSE, which = 1:ncol(x\$pars), remove = NULL, ask = NULL, ...) ## S3 method for class 'modMCMC' plot(x, Full = FALSE, which = 1:ncol(x\$pars), trace = TRUE, remove = NULL, ask = NULL, ...) ```

Arguments

 `f ` the function to be evaluated, with first argument the vector of parameters which should be varied. It should return either the model residuals, an element of class modCost (as returned by a call to `modCost`) or -2*log(likelihood). The latter is equivalent to the sum-of-squares functions when using a Gaussian likelihood and prior. `p ` initial values for the parameters to be optimized over. `...` additional arguments passed to function `f` or to the methods. `jump ` jump length, either a number, a vector with length equal to the total number of parameters, a covariance matrix, or a function that takes as input the current values of the parameters and produces as output the perturbed parameters. See details. `prior ` -2*log(parameter prior probability), either a function that is called as `prior(p)` or `NULL`; in the latter case a non-informative prior is used (i.e. all parameters are equally likely, depending on `lower` and `upper` within min and max bounds). `var0 ` initial model variance; if `NULL`, it is assumed that the model variance is `1`, and the return element from `f` is -2*log (likelihood). If it has a value, it is assumed that the return element from `f` contain the model residuals or a list of class `modFit`. See details. Good options for `var0` are to use the modelvariance (`modVariance`) as returned by the `summary` method of modFit. When this option is chosen, and the model has several variables, they will all be scaled similarly. See vignette `FMEdyna`. In case the model has several variables with different magnitudes, then it may be better to scale each variable independently. In that case, one can use as `var0`, the mean of the unweighted squared residuals from the model fit as returned from modFit (`var_ms_unweighted`). See vignette `FME`. `wvar0 ` "weight" for the initial model variance – see details. `n0 ` parameter used for weighing the initial model variance - if `NULL`, it is estimated as `n0=wvar0*n`, where n = number of observations. See details. `lower ` lower bounds on the parameters; for unbounded parameters set equal to `-Inf`. `upper ` upper bounds on the parameters; for unbounded parameters set equal to `Inf`. `niter ` number of iterations for the MCMC. `outputlength ` number of iterations kept in the output; should be smaller or equal to `niter`. `updatecov ` number of iterations after which the parameter covariance matrix is (re)evaluated based on the parameters kept thus far, and used to update the MCMC jumps. `covscale ` scale factor for the parameter covariance matrix, used to perform the MCMC jumps. `burninlength ` number of initial iterations to be removed from output. `ntrydr ` maximal number of tries for the delayed rejection procedure. It is generally not a good idea to set this to a too large value. `drscale ` for each try during delayed rejection, the cholesky decomposition of the proposal matrix is scaled with this amount; if `NULL`, it is assumed to be ```c(0.2,0.25, 0.333, 0.333, ...)``` `verbose ` if `TRUE`: prints extra output. `object ` an object of class `modMCMC`. `x ` an object of class `modMCMC`. `Full ` If TRUE then not only the parameters will be plotted, but also the function value and (if appropriate) the model variance(s). `which ` the name or the index to the parameters that should be plotted. Default = all parameters. If `Full=TRUE`, setting `which =NULL` will plot only the function value and the model variance. `trace ` if `TRUE`, adds smoothed line to the plot. `remove ` a list with indices of the runs that should be removed (e.g. to remove runs during burnin. `nsample ` the number of xy pairs to be plotted on the upper panel in the pairs plot. When `NULL` all xy pairs plotted. Set to a lower number in case the graph becomes too dense (and the exported picture too large). This does not affect the histograms on the diagonal plot (which are estimated using all MCMC draws). `ask ` logical; if `TRUE`, the user is asked before each plot, if `NULL` the user is only asked if more than one page of plots is necessary and the current graphics device is set interactive, see `par(ask=.)` and `dev.interactive`.

Details

Note that arguments after ... must be matched exactly.

R-function `f` is called as `f(p, ...)`. It should return either -2 times the log likelihood of the model (one value), the residuals between model and data or an item of class `modFit` (as created by function `modFit`.

In the latter two cases, it is assumed that the prior distribution for theta is either non-informative or gaussian. If gaussian, it can be treated as a sum of squares (SS). If the measurement function is defined as:

y = f(p) + xi; xi ~ N(0, sigma^2)

where xi is the measurement error, assumed normally distribution, then the posterior for the parameters will be estimated as:

prob(p|y,sigma^2)~exp(-0.5 SS(p)/sigma^2)+SSpri(p)

and where sigma^2 is the error variance, SS is the sum of squares function SS(p)=sum(yi-f(p))^2. If non-informative priors are used, then SSpri(p)=0.

The error variance sigma^2 is considered a nuisance parameter. A prior distribution of it should be specified and a posterior distribution is estimated.

If `wvar0` or `n0` is >0, then the variances are sampled as conjugate priors from the inverse gamma distribution with parameters `var0` and `n0=wvar0*n`. Larger values of `wvar0` keep these samples closer to `var0`.

Thus, at each step, 1/ the error variance (sigma^{-2}) is sampled from a gamma distribution:

prob(sigma^2|y,p)~Gam(0.5*(n0+n), 0.5*(n0+var0+SS(p))

where `n` is the number of data points and where n0=n * wvar0, and where the second argument to the gamma function is the shape parameter.

The prior parameters (`var0` and `wvar0`) are the prior mean for sigma^2 and the prior accuracy.

By setting `wvar0` equal to 1, equal weight is given to the prior and the current value.

If `wvar0` is 0 then the prior is ignored.

If `wvar0` is `NULL` (the default) then the error variances are assumed to be fixed.

`var0` estimates the variance of the measured components. In case independent estimates are not available, these variances can be obtained from the mean squares of fitted residuals. (e.g. as reported in `modFit`). See the examples. (but note that this is not truly independent information)

`var0` is either one value, or a value for each observed variable, or a value for each observed data point.

When `var0` is not `NULL`, then `f` is assumed to return the model residuals OR an instance of class `modCost`.

When `var0=NULL`, then `f` should return either -2*log(probability of the model), or an instance of class `modCost`.

`modMCMC` implements the Metropolis-Hastings method. The proposal distribution, which is used to generate new parameter values is the (multidimensional) Gaussian density distribution, with standard deviation given by `jump`.

`jump` can be either one value, a vector of length = number of parameters or a parameter covariance matrix (nrow = ncol = number parameters).

The jump parameter, `jump` thus determines how much the new parameter set will deviate from the old one.

If `jump` is one value, or a vector, then the new parameter values are generated by sampling a normal distribution with standard deviation equal to `jump`. A larger value will lead to larger jumps in the parameter space, but acceptance of new points can get very low. Smaller jump lengths increase the acceptance rate, but the algorithm may move too slowly, and too many runs may be needed to scan the parameter space.

If `jump` is `NULL`, then the jump length is taken as 10% of the parameter value as given in `p`.

`jump` can also be a proposal covariance matrix. In this case, the new parameter values are generated by sampling a multidimensional normal distribution. It can be efficient to initialise `jump` using the parameter covariance as resulting from fitting the model (e.g. using `modFit`) – see examples.

Finally, `jump` can also be an R-function that takes as input the current values of the parameters and returns the new parameter values.

Two methods are implemented to increase the number of accepted runs.

1. In the "adaptive Metropolis" method, new parameters are generated with a covariance matrix that is estimated from the parameters generated (and saved) thus far. The idea behind this is that the MCMC method is more efficient if the proposal covariance (to generate new parameter values) is somehow tuned to the shape and size of the target distribution.

Setting `updatecov` smaller than `niter` will trigger this functionality. In this case, every `updatecov` iterations, the jump covariance matrix will be estimated from the covariance matrix of the saved parameter values. The covariance matrix is scaled with (2.4^2/npar) where npar is the number of parameters, unless `covscale` has been given a different value. Thus, Jump=cov(p1,p2,...pn) * diag(np+1e-16)*2.4^2/npar where the small number 1e^{-16} is added on the diagonal of the covariance matrix to prevent it from becoming singular.

Note that a problem of adapting the proposal distribution using the MCMC results so far is that standard convergence results do not apply. One solution is to use adaptation only for the burn-in period and discard the part of the chain where adaptation has been used.

Thus, when using `updatecov` with a positive value of `burninlength`, the proposal distribution is only updated during burnin. If `burninlength` = 0 though, the updates occur throughout the entire simulation.

When using the adaptive Metropolis method, it is best to start with a small value of the jump length.

2. In the "delayed rejection" method, new parameter values are tried upon rejection. The process of delaying rejection can be iterated for at most `ntrydr` trials. Setting `ntrydr` equal to 1 (the default) toggles off delayed rejection.

During the delayed rejection procedure, new parameters are generated from the last accepted value by scaling the jump covariance matrix with a factor as specified in `drscale`. The acceptance probability of this new set depends on the candidates so far proposed and rejected, in such a way that reversibility of the Markov chain is preserved. See Haario et al. (2005, 2006) for more details.

Convergence of the MCMC chain can be checked via `plot`, which plots for each iteration the values of all parameters, and if `Full` is `TRUE`, of the function value (SS) and (if appropriate) the modeled variance. If converged, there should be no visible drift.

In addition, the methods from package `coda` become available by making the object returned by `modMCMC` of class `mcmc`, as used in the methods of `coda`. For instance, if object `MCMCres` is returned by `modMCMC` then `as.mcmc(MCMCres\$pars)` will make an instance of class `mcmc`, usable by `coda`.

The `burninlength` is the number of initial steps that is not included in the output. It can be useful if the initial value of the parameters is far from the optimal value. Starting the MCMC with the best fit parameter set will alleviate the need for using `burninlength`.

Value

a list of class modMCMC containing the results as returned from the Markov chain.

This includes the following:

 `pars ` an array with dimension (`outputlength`, length(`p`)), containing the parameters of the MCMC at each iteration that is kept. `SS ` vector with the sum of squares function, one for each row in `pars`. `naccepted ` the number of accepted runs. `sig ` the sampled error variance sigma^2, a matrix with one row for each row in `pars`. `bestpar ` the parameter set that gave the highest probability. `bestfunp ` the function value corresponding to `bestpar`. `prior ` the parameter prior, one value for each row in `pars`. `count ` information about the MCMC chain: number of delayed rejection steps (`dr_steps`), the number of alfa steps `Alfasteps`, the number of accepted runs (`num_accepted`) and the number of times the proposal covariance matrix has been updated (`num_covupdate`.) `settings ` the settings for error covariance calculation, i.e. arguments `var0`, `n0` and `N` the number of data points.

The list returned by `modMCMC` has methods for the generic functions `summary`, `plot`, `pairs` – see note.

Note

The following S3 methods are provided:

• summary, produces summary statistics of the MCMC results

• plot, plots the MCMC results, for all parameters. Use it to check convergence.

• pairs, produces a pairs plot of the MCMC results; overrides the default `gap = 0`, `upper.panel = NA`, and `diag.panel`.

It is also possible to use the methods from the `coda` package, e.g. `densplot`.

To do that, first the `modMCMC` object has to be converted to an mcmc object. See the examples for an application.

Author(s)

Karline Soetaert <[email protected]>

Marko Laine <[email protected]>

References

Laine, M., 2008. Adaptive MCMC Methods With Applications in Environmental and Geophysical Models. Finnish Meteorological Institute contributions 69, ISBN 978-951-697-662-7, Finnish Meteorological Institute, Helsinki.

Haario, H., Saksman, E. and Tamminen, J., 2001. An Adaptive Metropolis Algorithm. Bernoulli 7, pp. 223–242.

Haario, H., Laine, M., Mira, A. and Saksman, E., 2006. DRAM: Efficient Adaptive MCMC. Statistics and Computing, 16(4), 339–354.

Haario, H., Saksman, E. and Tamminen, J., 2005. Componentwise Adaptation for High Dimensional MCMC. Computational Statistics 20(2), 265–274.

Gelman, A. Varlin, J. B., Stern, H. S. and Rubin, D. B., 2004. Bayesian Data Analysis. Second edition. Chapman and Hall, Boca Raton.

Soetaert, K. and Petzoldt, T., 2010. Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME. Journal of Statistical Software 33(3) 1–28. http://www.jstatsoft.org/v33/i03

`modFit` for constrained model fitting
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133``` ```## ======================================================================= ## Sampling a 3-dimensional normal distribution, ## ======================================================================= # mean = 1:3, sd = 0.1 # f returns -2*log(probability) of the parameter values NN <- function(p) { mu <- c(1,2,3) -2*sum(log(dnorm(p, mean = mu, sd = 0.1))) #-2*log(probability) } # simple Metropolis-Hastings MCMC <- modMCMC(f = NN, p = 0:2, niter = 5000, outputlength = 1000, jump = 0.5) # More accepted values by updating the jump covariance matrix... MCMC <- modMCMC(f = NN, p = 0:2, niter = 5000, updatecov = 100, outputlength = 1000, jump = 0.5) summary(MCMC) plot(MCMC) # check convergence pairs(MCMC) ## ======================================================================= ## test 2: sampling a 3-D normal distribution, larger standard deviation... ## noninformative priors, lower and upper bounds imposed on parameters ## ======================================================================= NN <- function(p) { mu <- c(1,2,2.5) -2*sum(log(dnorm(p, mean = mu, sd = 0.5))) #-2*log(probability) } MCMC2 <- modMCMC(f = NN, p = 1:3, niter = 2000, burninlength = 500, updatecov = 10, jump = 0.5, lower = c(0, 2, 1), upper = c(1, 3, 3)) plot(MCMC2) hist(MCMC2, breaks = 20) ## Compare output of p3 with theoretical distribution hist(MCMC2, which = "p3", breaks = 20) lines(seq(1, 3, 0.1), dnorm(seq(1, 3, 0.1), mean = 2.5, sd = 0.5)/pnorm(3, 2.5, 0.5)) summary(MCMC2) # functions from package coda... cumuplot(as.mcmc(MCMC2\$pars)) summary(as.mcmc(MCMC2\$pars)) raftery.diag(MCMC2\$pars) ## ======================================================================= ## test 3: sampling a log-normal distribution, log mean=1:4, log sd = 1 ## ======================================================================= NL <- function(p) { mu <- 1:4 -2*sum(log(dlnorm(p, mean = mu, sd = 1))) #-2*log(probability) } MCMCl <- modMCMC(f = NL, p = log(1:4), niter = 3000, outputlength = 1000, jump = 5) plot(MCMCl) # bad convergence cumuplot(as.mcmc(MCMCl\$pars)) MCMCl <- modMCMC (f = NL, p = log(1:4), niter = 3000, outputlength = 1000, jump = 2^(2:5)) plot(MCMCl) # better convergence but CHECK it! pairs(MCMCl) colMeans(log(MCMCl\$pars)) apply(log(MCMCl\$pars), 2, sd) MCMCl <- modMCMC (f = NL, p = rep(1, 4), niter = 3000, outputlength = 1000, jump = 5, updatecov = 100) plot(MCMCl) colMeans(log(MCMCl\$pars)) apply(log(MCMCl\$pars), 2, sd) ## ======================================================================= ## Fitting a Monod (Michaelis-Menten) function to data ## ======================================================================= # the observations #--------------------- Obs <- data.frame(x=c( 28, 55, 83, 110, 138, 225, 375), # mg COD/l y=c(0.053,0.06,0.112,0.105,0.099,0.122,0.125)) # 1/hour plot(Obs, pch = 16, cex = 2, xlim = c(0, 400), ylim = c(0, 0.15), xlab = "mg COD/l", ylab = "1/hr", main = "Monod") # the Monod model #--------------------- Model <- function(p, x) data.frame(x = x, N = p[1]*x/(x+p[2])) # Fitting the model to the data #--------------------- # define the residual function Residuals <- function(p) (Obs\$y - Model(p, Obs\$x)\$N) # use modFit to find parameters P <- modFit(f = Residuals, p = c(0.1, 1)) # plot best-fit model x <-0:375 lines(Model(P\$par, x)) # summary of fit sP <- summary(P) sP[] print(sP) # Running an MCMC #--------------------- # estimate parameter covariances # (to efficiently generate new parameter values) Covar <- sP\$cov.scaled * 2.4^2/2 # the model variance s2prior <- sP\$modVariance # set nprior = 0 to avoid updating model variance MCMC <- modMCMC(f = Residuals, p = P\$par,jump = Covar, niter = 1000, var0 = s2prior, wvar0 = NULL, updatecov = 100) plot(MCMC, Full = TRUE) pairs(MCMC) # function from the coda package. raftery.diag(as.mcmc(MCMC\$pars)) cor(MCMC\$pars) cov(MCMC\$pars) # covariances by MCMC sP\$cov.scaled # covariances by Hessian of fit x <- 1:400 SR <- summary(sensRange(parInput = MCMC\$pars, func = Model, x = x)) plot(SR, xlab="mg COD/l", ylab = "1/hr", main = "Monod") points(Obs, pch = 16, cex = 1.5) ```