Amwg: Turn a non-adaptive Metropolis sampler into an adaptive...
In overture: Tools for Writing MCMC

Description Usage Arguments Details Value References Examples

View source: R/utilities.R

Given a non-adaptive sampler of the form f(..., s), Amwg will return a function g(...) that automatically adapts the Metropolis proposal standard deviation s to try and achieve a target acceptance rate.

1	Amwg(f, s, batch.size = 50, target = 0.44, DeltaN, stop.after = NA)

`f`	non-adaptive Metropolis sampler of the form f(..., s)
`s`	initial value for the Metropolis proposal SD
`batch.size`	number of iterations before proposal SD is adapted
`target`	target acceptance rate
`DeltaN`	function of the form f(n) which returns the adaptation amount based on the number of elapsed iterations, n. Defaults to δ(n) = min(0.01, n^{-1/2})
`stop.after`	stop adapting proposal SD after this many iterations

Amwg uses the Adaptive Metropolis-Within-Gibbs algorithm from Roberts & Rosenthal (2009), which re-scales the proposal standard deviation after a fixed number of MCMC iterations have elapsed. The goal of the algorithm is to achieve a target acceptance rate for the Metropolis step. After the nth batch of MCMC iterations the log of the proposal standard deviation, log(s), is increased/decreased by δ(n). log(s) is increased by δ(n) if the observed acceptance rate is more than the target acceptance rate, or decreased by δ(n) if the observed acceptance rate is less than the target acceptance rate. Amwg keeps track of the the acceptance rate by comparing the previously sampled value from f to the next value. If the two values are equal, the proposal is considered to be rejected, whereas if the two values are different the proposal is considered accepted. Amwg will optionally stop adapting the proposal standard deviation after stop.after iterations. Setting stop.after can be used, for example, to stop adapting the proposal standard deviation after some burn-in period. If stop.after=NA (the default), Amwg will continue to modify the proposal standard deviation throughout the entire MCMC.

DeltaN is set to δ(n) = min(0.01, n^{-1/2}) unless re-specified in the function call. Some care should be taken if re-specifying DeltaN, as the ergodicity of the chain may not be preserved if certain conditions aren't met. See Roberts & Rosenthal (2009) in the references for details.

The proposal standard deviation s can be either a vector or a scalar. If the initial value of s is a scalar, f will be treated as a sampler for a scalar, a random vector, or a joint parameter update. Alternatively, if the dimension of s is equal to the dimension of the parameters returned by f, the individual elements s will be treated as individual proposal standard deviations for the elements returned by f. This functionality can be used, for example, if f samples each of its returned elements individually, updating each element using a Metropolis step. See the examples for an illustration of this use case. In such settings, f should be constructed to receive s as a vector argument.

Adaptive Metropolis sampler function of the form g(...).

Gareth O. Roberts & Jeffrey S. Rosenthal (2009) Examples of Adaptive MCMC, Journal of Computational and Graphical Statistics, 18:2, 349-367, doi: 10.1198/jcgs.2009.06134

# Sample from N(1, 2^2) ---------------------------------------------------
LogP <- function(x) dnorm(x, 1, 2, log=TRUE) # Target distribution

f <- function(x, s) { # Non-adaptive Metropolis sampler
    x.prop <- x + rnorm(1, 0, s)
    if(AcceptProp(LogP(x), LogP(x.prop))) {
        x <- x.prop
    }

    return(x)
}

s.start <- 0.1
g <- Amwg(f, s.start, batch.size=25)

n.save <- 10000
Mcmc <- InitMcmc(n.save)
y <- 0
x <- 0
samples <- Mcmc({
    y <- f(y, s.start) # Non-adaptive
    x <- g(x) # Adaptive
})

plot(1:n.save, samples$x, ylim=c(-10, 10), main="Traceplots", xlab="Iteration",
     ylab="Value", type='l')
lines(1:n.save, samples$y, col="red")
legend("bottomleft", legend=c("Adaptive", "Non-adaptive"),
       col=c("black", "red"), lty=1, cex=0.8)

# Sample from Gamma(10, 5) ------------------------------------------------
LogP <- function(x) dgamma(x, 10, 5, log=TRUE) # Target distribution

f <- function(x, s) { # Non-adaptive Metropolis sampler
    x.prop <- x + rnorm(1, 0, s)
    if(AcceptProp(LogP(x), LogP(x.prop))) {
        x <- x.prop
    }

    return(x)
}

s.start <- 10
stop.after <- 5000 # Stop changing the proposal SD after 5000 iterations
g <- Amwg(f, s.start, batch.size=25, stop.after=stop.after)

n.save <- 10000
Mcmc <- InitMcmc(n.save)
x <- 1
samples <- Mcmc({
    x <- g(x)
})
hist(samples$x[stop.after:n.save,], xlab="x", main="Gamma(10, 5)", freq=FALSE)
curve(dgamma(x, 10, 5), from=0, to=max(samples$x), add=TRUE, col="blue")

# Overdispersed Poisson ---------------------------------------------------
## Likelihood:
## y_i|theta_i ~ Pois(theta_i), i=1,...,n
## Prior:
## theta_i ~ Log-Normal(mu, sigma^2)
## mu ~ Normal(m, v^2), m and v^2 fixed
## sigma^2 ~ InverseGamma(a, b), a and b fixed


SampleSigma2 <- function(theta.vec, mu, a, b, n.obs) {
    1/rgamma(1, a + n.obs/2, b + (1/2)*sum((log(theta.vec) - mu)^2))
}

SampleMu <- function(theta.vec, sigma.2, m, v.2, n.obs) {
    mu.var <- (1/v.2 + n.obs/sigma.2)^(-1)
    mu.mean <- (m/v.2 + sum(log(theta.vec))/sigma.2) * mu.var

    return(rnorm(1, mu.mean, sqrt(mu.var)))
}

LogDTheta <- function(theta, mu, sigma.2, y) {
    dlnorm(theta, mu, sqrt(sigma.2), log=TRUE) + dpois(y, theta, log=TRUE)
}

# Non-adaptive Metropolis sampler
SampleTheta <- function(theta.vec, mu, sigma.2, y.vec, n.obs, s) {
    theta.prop <- exp(log(theta.vec) + rnorm(n.obs, 0, s))

    # Jacobians, because proposals are made on the log scale
    j.curr <- log(theta.vec)
    j.prop <- log(theta.prop)

    log.curr <- LogDTheta(theta.vec, mu, sigma.2, y.vec) + j.curr
    log.prop <- LogDTheta(theta.prop, mu, sigma.2, y.vec) + j.prop
    theta.vec <- ifelse(AcceptProp(log.curr, log.prop), theta.prop, theta.vec)

    return(theta.vec)
}

## Data
y.vec <- warpbreaks$breaks
n.obs <- length(y.vec)

## Setup adaptive Metropolis sampler
s <- rep(1, n.obs)
# s is a vector, so the acceptance rate of each component will be tracked
# individually in the adaptive Metropolis sampler
SampleThetaAdapative <- Amwg(SampleTheta, s)

## Set prior
v.2 <- 0.05
m <- log(30) - v.2/2
a <- 1
b <- 2

## Initialize parameters
theta.vec <- y.vec
mu <- m

## MCMC
Mcmc <- InitMcmc(10000)
samples <- Mcmc({
    sigma.2 <- SampleSigma2(theta.vec, mu, a, b, n.obs)
    mu <- SampleMu(theta.vec, sigma.2, m, v.2, n.obs)
    theta.vec <- SampleThetaAdapative(theta.vec, mu, sigma.2, y.vec, n.obs)
})