Constrained Markov Chain Monte Carlo

Share:

Description

Performs a Markov Chain Monte Carlo simulation, using an adaptive Metropolis (AM) algorithm and including a delayed rejection (DR) procedure.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
modMCMC(f, p, ..., jump = NULL,  lower = -Inf, upper = +Inf, 
        prior = NULL, var0 = NULL, wvar0 = NULL, n0 = NULL, 
        niter = 1000, outputlength = niter, burninlength = 0, 
        updatecov = niter, covscale = 2.4^2/length(p),
        ntrydr = 1, drscale = NULL, verbose = TRUE)



## S3 method for class 'modMCMC'
summary(object, remove = NULL, ...)

## S3 method for class 'modMCMC'
pairs(x, Full = FALSE, which = 1:ncol(x$pars),
  remove = NULL, nsample = NULL, ...)

## S3 method for class 'modMCMC'
hist(x, Full = FALSE, which = 1:ncol(x$pars),
  remove = NULL, ask = NULL, ...)

## S3 method for class 'modMCMC'
plot(x, Full = FALSE, which = 1:ncol(x$pars),
  trace = TRUE, remove = NULL, ask = NULL, ...)

Arguments

f

the function to be evaluated, with first argument the vector of parameters which should be varied. It should return either the model residuals, an element of class modCost (as returned by a call to modCost) or -2*log(likelihood). The latter is equivalent to the sum-of-squares functions when using a Gaussian likelihood and prior.

p

initial values for the parameters to be optimized over.

...

additional arguments passed to function f or to the methods.

jump

jump length, either a number, a vector with length equal to the total number of parameters, a covariance matrix, or a function that takes as input the current values of the parameters and produces as output the perturbed parameters. See details.

prior

-2*log(parameter prior probability), either a function that is called as prior(p) or NULL; in the latter case a non-informative prior is used (i.e. all parameters are equally likely, depending on lower and upper within min and max bounds).

var0

initial model variance; if NULL, it is assumed that the model variance is 1, and the return element from f is -2*log (likelihood). If it has a value, it is assumed that the return element from f contain the model residuals or a list of class modFit. See details. Good options for var0 are to use the modelvariance (modVariance) as returned by the summary method of modFit. When this option is chosen, and the model has several variables, they will all be scaled similarly. See vignette FMEdyna. In case the model has several variables with different magnitudes, then it may be better to scale each variable independently. In that case, one can use as var0, the mean of the unweighted squared residuals from the model fit as returned from modFit (var_ms_unweighted). See vignette FME.

wvar0

"weight" for the initial model variance – see details.

n0

parameter used for weighing the initial model variance - if NULL, it is estimated as n0=wvar0*n, where n = number of observations. See details.

lower

lower bounds on the parameters; for unbounded parameters set equal to -Inf.

upper

upper bounds on the parameters; for unbounded parameters set equal to Inf.

niter

number of iterations for the MCMC.

outputlength

number of iterations kept in the output; should be smaller or equal to niter.

updatecov

number of iterations after which the parameter covariance matrix is (re)evaluated based on the parameters kept thus far, and used to update the MCMC jumps.

covscale

scale factor for the parameter covariance matrix, used to perform the MCMC jumps.

burninlength

number of initial iterations to be removed from output.

ntrydr

maximal number of tries for the delayed rejection procedure. It is generally not a good idea to set this to a too large value.

drscale

for each try during delayed rejection, the cholesky decomposition of the proposal matrix is scaled with this amount; if NULL, it is assumed to be c(0.2,0.25, 0.333, 0.333, ...)

verbose

if TRUE: prints extra output.

object

an object of class modMCMC.

x

an object of class modMCMC.

Full

If TRUE then not only the parameters will be plotted, but also the function value and (if appropriate) the model variance(s).

which

the name or the index to the parameters that should be plotted. Default = all parameters. If Full=TRUE, setting which =NULL will plot only the function value and the model variance.

trace

if TRUE, adds smoothed line to the plot.

remove

a list with indices of the runs that should be removed (e.g. to remove runs during burnin.

nsample

the number of xy pairs to be plotted on the upper panel in the pairs plot. When NULL all xy pairs plotted. Set to a lower number in case the graph becomes too dense (and the exported picture too large). This does not affect the histograms on the diagonal plot (which are estimated using all MCMC draws).

ask

logical; if TRUE, the user is asked before each plot, if NULL the user is only asked if more than one page of plots is necessary and the current graphics device is set interactive, see par(ask=.) and dev.interactive.

Details

Note that arguments after ... must be matched exactly.

R-function f is called as f(p, ...). It should return either -2 times the log likelihood of the model (one value), the residuals between model and data or an item of class modFit (as created by function modFit.

In the latter two cases, it is assumed that the prior distribution for theta is either non-informative or gaussian. If gaussian, it can be treated as a sum of squares (SS). If the measurement function is defined as:

y = f(p) + xi; xi ~ N(0, sigma^2)

where xi is the measurement error, assumed normally distribution, then the posterior for the parameters will be estimated as:

prob(p|y,sigma^2)~exp(-0.5 SS(p)/sigma^2)+SSpri(p)

and where sigma^2 is the error variance, SS is the sum of squares function SS(p)=sum(yi-f(p))^2. If non-informative priors are used, then SSpri(p)=0.

The error variance sigma^2 is considered a nuisance parameter. A prior distribution of it should be specified and a posterior distribution is estimated.

If wvar0 or n0 is >0, then the variances are sampled as conjugate priors from the inverse gamma distribution with parameters var0 and n0=wvar0*n. Larger values of wvar0 keep these samples closer to var0.

Thus, at each step, 1/ the error variance (sigma^{-2}) is sampled from a gamma distribution:

prob(sigma^2|y,p)~Gam(0.5*(n0+n), 0.5*(n0+var0+SS(p))

where n is the number of data points and where n0=n * wvar0, and where the second argument to the gamma function is the shape parameter.

The prior parameters (var0 and wvar0) are the prior mean for sigma^2 and the prior accuracy.

By setting wvar0 equal to 1, equal weight is given to the prior and the current value.

If wvar0 is 0 then the prior is ignored.

If wvar0 is NULL (the default) then the error variances are assumed to be fixed.

var0 estimates the variance of the measured components. In case independent estimates are not available, these variances can be obtained from the mean squares of fitted residuals. (e.g. as reported in modFit). See the examples. (but note that this is not truly independent information)

var0 is either one value, or a value for each observed variable, or a value for each observed data point.

When var0 is not NULL, then f is assumed to return the model residuals OR an instance of class modCost.

When var0=NULL, then f should return either -2*log(probability of the model), or an instance of class modCost.

modMCMC implements the Metropolis-Hastings method. The proposal distribution, which is used to generate new parameter values is the (multidimensional) Gaussian density distribution, with standard deviation given by jump.

jump can be either one value, a vector of length = number of parameters or a parameter covariance matrix (nrow = ncol = number parameters).

The jump parameter, jump thus determines how much the new parameter set will deviate from the old one.

If jump is one value, or a vector, then the new parameter values are generated by sampling a normal distribution with standard deviation equal to jump. A larger value will lead to larger jumps in the parameter space, but acceptance of new points can get very low. Smaller jump lengths increase the acceptance rate, but the algorithm may move too slowly, and too many runs may be needed to scan the parameter space.

If jump is NULL, then the jump length is taken as 10% of the parameter value as given in p.

jump can also be a proposal covariance matrix. In this case, the new parameter values are generated by sampling a multidimensional normal distribution. It can be efficient to initialise jump using the parameter covariance as resulting from fitting the model (e.g. using modFit) – see examples.

Finally, jump can also be an R-function that takes as input the current values of the parameters and returns the new parameter values.

Two methods are implemented to increase the number of accepted runs.

  1. In the "adaptive Metropolis" method, new parameters are generated with a covariance matrix that is estimated from the parameters generated (and saved) thus far. The idea behind this is that the MCMC method is more efficient if the proposal covariance (to generate new parameter values) is somehow tuned to the shape and size of the target distribution.

    Setting updatecov smaller than niter will trigger this functionality. In this case, every updatecov iterations, the jump covariance matrix will be estimated from the covariance matrix of the saved parameter values. The covariance matrix is scaled with (2.4^2/npar) where npar is the number of parameters, unless covscale has been given a different value. Thus, Jump=cov(p1,p2,...pn) * diag(np+1e-16)*2.4^2/npar where the small number 1e^{-16} is added on the diagonal of the covariance matrix to prevent it from becoming singular.

    Note that a problem of adapting the proposal distribution using the MCMC results so far is that standard convergence results do not apply. One solution is to use adaptation only for the burn-in period and discard the part of the chain where adaptation has been used.

    Thus, when using updatecov with a positive value of burninlength, the proposal distribution is only updated during burnin. If burninlength = 0 though, the updates occur throughout the entire simulation.

    When using the adaptive Metropolis method, it is best to start with a small value of the jump length.

  2. In the "delayed rejection" method, new parameter values are tried upon rejection. The process of delaying rejection can be iterated for at most ntrydr trials. Setting ntrydr equal to 1 (the default) toggles off delayed rejection.

    During the delayed rejection procedure, new parameters are generated from the last accepted value by scaling the jump covariance matrix with a factor as specified in drscale. The acceptance probability of this new set depends on the candidates so far proposed and rejected, in such a way that reversibility of the Markov chain is preserved. See Haario et al. (2005, 2006) for more details.

Convergence of the MCMC chain can be checked via plot, which plots for each iteration the values of all parameters, and if Full is TRUE, of the function value (SS) and (if appropriate) the modeled variance. If converged, there should be no visible drift.

In addition, the methods from package coda become available by making the object returned by modMCMC of class mcmc, as used in the methods of coda. For instance, if object MCMCres is returned by modMCMC then as.mcmc(MCMCres$pars) will make an instance of class mcmc, usable by coda.

The burninlength is the number of initial steps that is not included in the output. It can be useful if the initial value of the parameters is far from the optimal value. Starting the MCMC with the best fit parameter set will alleviate the need for using burninlength.

Value

a list of class modMCMC containing the results as returned from the Markov chain.

This includes the following:

pars

an array with dimension (outputlength, length(p)), containing the parameters of the MCMC at each iteration that is kept.

SS

vector with the sum of squares function, one for each row in pars.

naccepted

the number of accepted runs.

sig

the sampled error variance sigma^2, a matrix with one row for each row in pars.

bestpar

the parameter set that gave the highest probability.

bestfunp

the function value corresponding to bestpar.

prior

the parameter prior, one value for each row in pars.

count

information about the MCMC chain: number of delayed rejection steps (dr_steps), the number of alfa steps Alfasteps, the number of accepted runs (num_accepted) and the number of times the proposal covariance matrix has been updated (num_covupdate.)

settings

the settings for error covariance calculation, i.e. arguments var0, n0 and N the number of data points.

The list returned by modMCMC has methods for the generic functions summary, plot, pairs – see note.

Note

The following S3 methods are provided:

  • summary, produces summary statistics of the MCMC results

  • plot, plots the MCMC results, for all parameters. Use it to check convergence.

  • pairs, produces a pairs plot of the MCMC results; overrides the default gap = 0, upper.panel = NA, and diag.panel.

It is also possible to use the methods from the coda package, e.g. densplot.

To do that, first the modMCMC object has to be converted to an mcmc object. See the examples for an application.

Author(s)

Karline Soetaert <karline.soetaert@nioz.nl>

Marko Laine <Marko.Laine@fmi.fi>

References

Laine, M., 2008. Adaptive MCMC Methods With Applications in Environmental and Geophysical Models. Finnish Meteorological Institute contributions 69, ISBN 978-951-697-662-7, Finnish Meteorological Institute, Helsinki.

Haario, H., Saksman, E. and Tamminen, J., 2001. An Adaptive Metropolis Algorithm. Bernoulli 7, pp. 223–242.

Haario, H., Laine, M., Mira, A. and Saksman, E., 2006. DRAM: Efficient Adaptive MCMC. Statistics and Computing, 16(4), 339–354.

Haario, H., Saksman, E. and Tamminen, J., 2005. Componentwise Adaptation for High Dimensional MCMC. Computational Statistics 20(2), 265–274.

Gelman, A. Varlin, J. B., Stern, H. S. and Rubin, D. B., 2004. Bayesian Data Analysis. Second edition. Chapman and Hall, Boca Raton.

Soetaert, K. and Petzoldt, T., 2010. Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME. Journal of Statistical Software 33(3) 1–28. http://www.jstatsoft.org/v33/i03

See Also

modFit for constrained model fitting

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
## =======================================================================
## Sampling a 3-dimensional normal distribution,
## =======================================================================
# mean = 1:3, sd = 0.1
# f returns -2*log(probability) of the parameter values

NN <- function(p) {
  mu <- c(1,2,3)
  -2*sum(log(dnorm(p, mean = mu, sd = 0.1)))   #-2*log(probability)
}

# simple Metropolis-Hastings
MCMC <- modMCMC(f = NN, p = 0:2, niter = 5000,
                outputlength = 1000, jump = 0.5)

# More accepted values by updating the jump covariance matrix...
MCMC <- modMCMC(f = NN, p = 0:2, niter = 5000, updatecov = 100,
                 outputlength = 1000, jump = 0.5)
summary(MCMC)

plot(MCMC)   # check convergence
pairs(MCMC)

## =======================================================================
## test 2: sampling a 3-D normal distribution, larger standard deviation...
## noninformative priors, lower and upper bounds imposed on parameters
## =======================================================================

NN <- function(p) {
  mu <- c(1,2,2.5)
  -2*sum(log(dnorm(p, mean = mu, sd = 0.5)))   #-2*log(probability)
}

MCMC2 <- modMCMC(f = NN, p = 1:3, niter = 2000, burninlength = 500,
  updatecov = 10, jump = 0.5, lower = c(0, 2, 1), upper = c(1, 3, 3))
plot(MCMC2)
hist(MCMC2, breaks = 20)

## Compare output of p3 with theoretical distribution
hist(MCMC2, which = "p3", breaks = 20)
lines(seq(1, 3, 0.1), dnorm(seq(1, 3, 0.1), mean = 2.5,
  sd = 0.5)/pnorm(3, 2.5, 0.5))
summary(MCMC2)

# functions from package coda...
cumuplot(as.mcmc(MCMC2$pars))
summary(as.mcmc(MCMC2$pars))
raftery.diag(MCMC2$pars)

## =======================================================================
## test 3: sampling a log-normal distribution, log mean=1:4, log sd = 1
## =======================================================================

NL <- function(p) {
  mu <- 1:4
  -2*sum(log(dlnorm(p, mean = mu, sd = 1)))      #-2*log(probability)
}
MCMCl <- modMCMC(f = NL, p = log(1:4), niter = 3000,
                 outputlength = 1000, jump = 5)
plot(MCMCl)   # bad convergence
cumuplot(as.mcmc(MCMCl$pars))

MCMCl <- modMCMC (f = NL, p = log(1:4), niter = 3000,
                  outputlength = 1000, jump = 2^(2:5))
plot(MCMCl)   # better convergence but CHECK it!
pairs(MCMCl)
colMeans(log(MCMCl$pars))
apply(log(MCMCl$pars), 2, sd)

MCMCl <- modMCMC (f = NL, p = rep(1, 4), niter = 3000, 
                  outputlength = 1000, jump = 5, updatecov = 100)
plot(MCMCl)
colMeans(log(MCMCl$pars))
apply(log(MCMCl$pars), 2, sd)

## =======================================================================
## Fitting a Monod (Michaelis-Menten) function to data
## =======================================================================

# the observations
#---------------------
Obs <- data.frame(x=c(   28,  55,   83,  110,  138,  225,  375),   # mg COD/l
                  y=c(0.053,0.06,0.112,0.105,0.099,0.122,0.125))   # 1/hour
plot(Obs, pch = 16, cex = 2, xlim = c(0, 400), ylim = c(0, 0.15),
     xlab = "mg COD/l", ylab = "1/hr", main = "Monod")

# the Monod model
#---------------------
Model <- function(p, x) data.frame(x = x, N = p[1]*x/(x+p[2]))

# Fitting the model to the data
#---------------------
# define the residual function
Residuals  <- function(p) (Obs$y - Model(p, Obs$x)$N)

# use modFit to find parameters
P      <- modFit(f = Residuals, p = c(0.1, 1))

# plot best-fit model
x      <-0:375
lines(Model(P$par, x))

# summary of fit
sP    <- summary(P)
sP[]
print(sP)

# Running an MCMC
#---------------------
# estimate parameter covariances
# (to efficiently generate new parameter values)
Covar   <- sP$cov.scaled * 2.4^2/2

# the model variance
s2prior <- sP$modVariance

# set nprior = 0 to avoid updating model variance
MCMC <- modMCMC(f = Residuals, p = P$par,jump = Covar, niter = 1000,
                var0 = s2prior, wvar0 = NULL, updatecov = 100)

plot(MCMC, Full = TRUE)
pairs(MCMC)
# function from the coda package.
raftery.diag(as.mcmc(MCMC$pars))
cor(MCMC$pars)

cov(MCMC$pars)   # covariances by MCMC
sP$cov.scaled    # covariances by Hessian of fit

x  <- 1:400
SR <- summary(sensRange(parInput = MCMC$pars, func = Model, x = x))
plot(SR, xlab="mg COD/l", ylab = "1/hr", main = "Monod")
points(Obs, pch = 16, cex = 1.5)