Constrained Markov Chain Monte Carlo
Description
Performs a Markov Chain Monte Carlo simulation, using an adaptive Metropolis (AM) algorithm and including a delayed rejection (DR) procedure.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  modMCMC(f, p, ..., jump = NULL, lower = Inf, upper = +Inf,
prior = NULL, var0 = NULL, wvar0 = NULL, n0 = NULL,
niter = 1000, outputlength = niter, burninlength = 0,
updatecov = niter, covscale = 2.4^2/length(p),
ntrydr = 1, drscale = NULL, verbose = TRUE)
## S3 method for class 'modMCMC'
summary(object, remove = NULL, ...)
## S3 method for class 'modMCMC'
pairs(x, Full = FALSE, which = 1:ncol(x$pars),
remove = NULL, nsample = NULL, ...)
## S3 method for class 'modMCMC'
hist(x, Full = FALSE, which = 1:ncol(x$pars),
remove = NULL, ask = NULL, ...)
## S3 method for class 'modMCMC'
plot(x, Full = FALSE, which = 1:ncol(x$pars),
trace = TRUE, remove = NULL, ask = NULL, ...)

Arguments
f 
the function to be evaluated, with first argument the vector
of parameters which should be varied. It should return either the
model residuals, an element of class modCost (as returned by
a call to 
p 
initial values for the parameters to be optimized over. 
... 
additional arguments passed to function 
jump 
jump length, either a number, a vector with length equal to the total number of parameters, a covariance matrix, or a function that takes as input the current values of the parameters and produces as output the perturbed parameters. See details. 
prior 
2*log(parameter prior probability), either a function
that is called as 
var0 
initial model variance; if 
wvar0 
"weight" for the initial model variance – see details. 
n0 
parameter used for weighing the initial model variance 
if 
lower 
lower bounds on the parameters; for unbounded parameters
set equal to 
upper 
upper bounds on the parameters; for unbounded parameters
set equal to 
niter 
number of iterations for the MCMC. 
outputlength 
number of iterations kept in the output; should
be smaller or equal to 
updatecov 
number of iterations after which the parameter covariance matrix is (re)evaluated based on the parameters kept thus far, and used to update the MCMC jumps. 
covscale 
scale factor for the parameter covariance matrix, used to perform the MCMC jumps. 
burninlength 
number of initial iterations to be removed from output. 
ntrydr 
maximal number of tries for the delayed rejection procedure. It is generally not a good idea to set this to a too large value. 
drscale 
for each try during delayed rejection, the cholesky
decomposition of the proposal matrix is scaled with this amount; if

verbose 
if 
object 
an object of class 
x 
an object of class 
Full 
If TRUE then not only the parameters will be plotted, but also the function value and (if appropriate) the model variance(s). 
which 
the name or the index to the parameters that should be
plotted. Default = all parameters. If 
trace 
if 
remove 
a list with indices of the runs that should be removed (e.g. to remove runs during burnin. 
nsample 
the number of xy pairs to be plotted on the upper
panel in the pairs plot. When 
ask 
logical; if 
Details
Note that arguments after ... must be matched exactly.
Rfunction f
is called as f(p, ...)
. It should return
either 2 times the log likelihood of the model (one value), the
residuals between model and data or an item of class modFit
(as
created by function modFit
.
In the latter two cases, it is assumed that the prior distribution for theta is either noninformative or gaussian. If gaussian, it can be treated as a sum of squares (SS). If the measurement function is defined as:
y = f(p) + xi; xi ~ N(0, sigma^2)
where xi is the measurement error, assumed normally distribution, then the posterior for the parameters will be estimated as:
prob(py,sigma^2)~exp(0.5 SS(p)/sigma^2)+SSpri(p)
and where sigma^2 is the error variance, SS is the sum of squares function SS(p)=sum(yif(p))^2. If noninformative priors are used, then SSpri(p)=0.
The error variance sigma^2 is considered a nuisance parameter. A prior distribution of it should be specified and a posterior distribution is estimated.
If wvar0
or n0
is >0, then the variances are sampled as
conjugate priors from the inverse gamma distribution with parameters
var0
and n0=wvar0*n
. Larger values of wvar0
keep
these samples closer to var0
.
Thus, at each step, 1/ the error variance (sigma^{2}) is sampled from a gamma distribution:
prob(sigma^2y,p)~Gam(0.5*(n0+n), 0.5*(n0+var0+SS(p))
where n
is the number of data points and where
n0=n * wvar0, and where the second argument to the gamma function
is the shape parameter.
The prior parameters (var0
and wvar0
) are the prior mean
for sigma^2 and the prior accuracy.
By setting wvar0
equal to 1, equal weight is given to the
prior and the current value.
If wvar0
is 0 then the prior is ignored.
If wvar0
is NULL
(the default) then the error variances are
assumed to be fixed.
var0
estimates the variance of the measured components. In case
independent estimates are not available, these variances can be
obtained from the mean squares of fitted residuals. (e.g. as reported
in modFit
). See the examples. (but note that this is not truly
independent information)
var0
is either one value, or a value for each observed
variable, or a value for each observed data point.
When var0
is not NULL
, then f
is assumed to
return the model residuals OR an instance of class modCost
.
When var0=NULL
, then f
should return either
2*log(probability of the model), or an instance of class
modCost
.
modMCMC
implements the MetropolisHastings method. The proposal
distribution, which is used to generate new parameter values is the
(multidimensional) Gaussian density distribution, with standard
deviation given by jump
.
jump
can be either one value, a vector of length = number of
parameters or a parameter covariance matrix (nrow = ncol = number
parameters).
The jump parameter, jump
thus determines how much the new
parameter set will deviate from the old one.
If jump
is one value, or a vector, then the new parameter
values are generated by sampling a normal distribution with standard
deviation equal to jump
. A larger value will lead to larger
jumps in the parameter space, but acceptance of new points can get
very low. Smaller jump lengths increase the acceptance rate, but the
algorithm may move too slowly, and too many runs may be needed to scan
the parameter space.
If jump
is NULL
, then the jump length is taken as 10%
of the parameter value as given in p
.
jump
can also be a proposal covariance matrix. In this case,
the new parameter values are generated by sampling a multidimensional
normal distribution. It can be efficient to initialise jump
using the parameter covariance as resulting from fitting the model
(e.g. using modFit
) – see examples.
Finally, jump
can also be an Rfunction that takes as input the
current values of the parameters and returns the new parameter values.
Two methods are implemented to increase the number of accepted runs.
In the "adaptive Metropolis" method, new parameters are generated with a covariance matrix that is estimated from the parameters generated (and saved) thus far. The idea behind this is that the MCMC method is more efficient if the proposal covariance (to generate new parameter values) is somehow tuned to the shape and size of the target distribution.
Setting
updatecov
smaller thanniter
will trigger this functionality. In this case, everyupdatecov
iterations, the jump covariance matrix will be estimated from the covariance matrix of the saved parameter values. The covariance matrix is scaled with (2.4^2/npar) where npar is the number of parameters, unlesscovscale
has been given a different value. Thus, Jump=cov(p1,p2,...pn) * diag(np+1e16)*2.4^2/npar where the small number 1e^{16} is added on the diagonal of the covariance matrix to prevent it from becoming singular.Note that a problem of adapting the proposal distribution using the MCMC results so far is that standard convergence results do not apply. One solution is to use adaptation only for the burnin period and discard the part of the chain where adaptation has been used.
Thus, when using
updatecov
with a positive value ofburninlength
, the proposal distribution is only updated during burnin. Ifburninlength
= 0 though, the updates occur throughout the entire simulation.When using the adaptive Metropolis method, it is best to start with a small value of the jump length.
In the "delayed rejection" method, new parameter values are tried upon rejection. The process of delaying rejection can be iterated for at most
ntrydr
trials. Settingntrydr
equal to 1 (the default) toggles off delayed rejection.During the delayed rejection procedure, new parameters are generated from the last accepted value by scaling the jump covariance matrix with a factor as specified in
drscale
. The acceptance probability of this new set depends on the candidates so far proposed and rejected, in such a way that reversibility of the Markov chain is preserved. See Haario et al. (2005, 2006) for more details.
Convergence of the MCMC chain can be checked via plot
, which
plots for each iteration the values of all parameters, and if
Full
is TRUE
, of the function value (SS) and (if
appropriate) the modeled variance. If converged, there should be no
visible drift.
In addition, the methods from package coda
become available by
making the object returned by modMCMC
of class mcmc
, as
used in the methods of coda
. For instance, if object
MCMCres
is returned by modMCMC
then
as.mcmc(MCMCres$pars)
will make an instance of class
mcmc
, usable by coda
.
The burninlength
is the number of initial steps that is not
included in the output. It can be useful if the initial value of the
parameters is far from the optimal value. Starting the MCMC with the
best fit parameter set will alleviate the need for using
burninlength
.
Value
a list of class modMCMC containing the results as returned from the Markov chain.
This includes the following:
pars 
an array with dimension ( 
SS 
vector with the sum of squares function, one for each row
in 
naccepted 
the number of accepted runs. 
sig 
the sampled error variance sigma^2, a matrix with
one row for each row in 
bestpar 
the parameter set that gave the highest probability. 
bestfunp 
the function value corresponding to 
prior 
the parameter prior, one value for each row in

count 
information about the MCMC chain: number of delayed
rejection steps ( 
settings 
the settings for error covariance calculation,
i.e. arguments 
The list returned by modMCMC
has methods for the generic
functions summary
, plot
,
pairs
– see note.
Note
The following S3 methods are provided:
summary, produces summary statistics of the MCMC results
plot, plots the MCMC results, for all parameters. Use it to check convergence.
pairs, produces a pairs plot of the MCMC results; overrides the default
gap = 0
,upper.panel = NA
, anddiag.panel
.
It is also possible to use the methods from the coda
package,
e.g. densplot
.
To do that, first the modMCMC
object has to be converted to an
mcmc object. See the examples for an application.
Author(s)
Karline Soetaert <karline.soetaert@nioz.nl>
Marko Laine <Marko.Laine@fmi.fi>
References
Laine, M., 2008. Adaptive MCMC Methods With Applications in Environmental and Geophysical Models. Finnish Meteorological Institute contributions 69, ISBN 9789516976627, Finnish Meteorological Institute, Helsinki.
Haario, H., Saksman, E. and Tamminen, J., 2001. An Adaptive Metropolis Algorithm. Bernoulli 7, pp. 223–242.
Haario, H., Laine, M., Mira, A. and Saksman, E., 2006. DRAM: Efficient Adaptive MCMC. Statistics and Computing, 16(4), 339–354.
Haario, H., Saksman, E. and Tamminen, J., 2005. Componentwise Adaptation for High Dimensional MCMC. Computational Statistics 20(2), 265–274.
Gelman, A. Varlin, J. B., Stern, H. S. and Rubin, D. B., 2004. Bayesian Data Analysis. Second edition. Chapman and Hall, Boca Raton.
Soetaert, K. and Petzoldt, T., 2010. Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME. Journal of Statistical Software 33(3) 1–28. http://www.jstatsoft.org/v33/i03
See Also
modFit
for constrained model fitting
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133  ## =======================================================================
## Sampling a 3dimensional normal distribution,
## =======================================================================
# mean = 1:3, sd = 0.1
# f returns 2*log(probability) of the parameter values
NN < function(p) {
mu < c(1,2,3)
2*sum(log(dnorm(p, mean = mu, sd = 0.1))) #2*log(probability)
}
# simple MetropolisHastings
MCMC < modMCMC(f = NN, p = 0:2, niter = 5000,
outputlength = 1000, jump = 0.5)
# More accepted values by updating the jump covariance matrix...
MCMC < modMCMC(f = NN, p = 0:2, niter = 5000, updatecov = 100,
outputlength = 1000, jump = 0.5)
summary(MCMC)
plot(MCMC) # check convergence
pairs(MCMC)
## =======================================================================
## test 2: sampling a 3D normal distribution, larger standard deviation...
## noninformative priors, lower and upper bounds imposed on parameters
## =======================================================================
NN < function(p) {
mu < c(1,2,2.5)
2*sum(log(dnorm(p, mean = mu, sd = 0.5))) #2*log(probability)
}
MCMC2 < modMCMC(f = NN, p = 1:3, niter = 2000, burninlength = 500,
updatecov = 10, jump = 0.5, lower = c(0, 2, 1), upper = c(1, 3, 3))
plot(MCMC2)
hist(MCMC2, breaks = 20)
## Compare output of p3 with theoretical distribution
hist(MCMC2, which = "p3", breaks = 20)
lines(seq(1, 3, 0.1), dnorm(seq(1, 3, 0.1), mean = 2.5,
sd = 0.5)/pnorm(3, 2.5, 0.5))
summary(MCMC2)
# functions from package coda...
cumuplot(as.mcmc(MCMC2$pars))
summary(as.mcmc(MCMC2$pars))
raftery.diag(MCMC2$pars)
## =======================================================================
## test 3: sampling a lognormal distribution, log mean=1:4, log sd = 1
## =======================================================================
NL < function(p) {
mu < 1:4
2*sum(log(dlnorm(p, mean = mu, sd = 1))) #2*log(probability)
}
MCMCl < modMCMC(f = NL, p = log(1:4), niter = 3000,
outputlength = 1000, jump = 5)
plot(MCMCl) # bad convergence
cumuplot(as.mcmc(MCMCl$pars))
MCMCl < modMCMC (f = NL, p = log(1:4), niter = 3000,
outputlength = 1000, jump = 2^(2:5))
plot(MCMCl) # better convergence but CHECK it!
pairs(MCMCl)
colMeans(log(MCMCl$pars))
apply(log(MCMCl$pars), 2, sd)
MCMCl < modMCMC (f = NL, p = rep(1, 4), niter = 3000,
outputlength = 1000, jump = 5, updatecov = 100)
plot(MCMCl)
colMeans(log(MCMCl$pars))
apply(log(MCMCl$pars), 2, sd)
## =======================================================================
## Fitting a Monod (MichaelisMenten) function to data
## =======================================================================
# the observations
#
Obs < data.frame(x=c( 28, 55, 83, 110, 138, 225, 375), # mg COD/l
y=c(0.053,0.06,0.112,0.105,0.099,0.122,0.125)) # 1/hour
plot(Obs, pch = 16, cex = 2, xlim = c(0, 400), ylim = c(0, 0.15),
xlab = "mg COD/l", ylab = "1/hr", main = "Monod")
# the Monod model
#
Model < function(p, x) data.frame(x = x, N = p[1]*x/(x+p[2]))
# Fitting the model to the data
#
# define the residual function
Residuals < function(p) (Obs$y  Model(p, Obs$x)$N)
# use modFit to find parameters
P < modFit(f = Residuals, p = c(0.1, 1))
# plot bestfit model
x <0:375
lines(Model(P$par, x))
# summary of fit
sP < summary(P)
sP[]
print(sP)
# Running an MCMC
#
# estimate parameter covariances
# (to efficiently generate new parameter values)
Covar < sP$cov.scaled * 2.4^2/2
# the model variance
s2prior < sP$modVariance
# set nprior = 0 to avoid updating model variance
MCMC < modMCMC(f = Residuals, p = P$par,jump = Covar, niter = 1000,
var0 = s2prior, wvar0 = NULL, updatecov = 100)
plot(MCMC, Full = TRUE)
pairs(MCMC)
# function from the coda package.
raftery.diag(as.mcmc(MCMC$pars))
cor(MCMC$pars)
cov(MCMC$pars) # covariances by MCMC
sP$cov.scaled # covariances by Hessian of fit
x < 1:400
SR < summary(sensRange(parInput = MCMC$pars, func = Model, x = x))
plot(SR, xlab="mg COD/l", ylab = "1/hr", main = "Monod")
points(Obs, pch = 16, cex = 1.5)
