Home

/

CRAN

/

mcgibbsit

/

mcgibbsit Example
In mcgibbsit: Warnes and Raftery's 'MCGibbsit' MCMC Run Length and Convergence Diagnostic

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

nformat <- function(x)
  format(x, big.mark = ',')

library(mcgibbsit)

set.seed(42)        # for reproducibility
tmpdir <- tempdir()

The mcgibbsit package provides an implementation of Warnes & Raftery's MCGibbsit run-length diagnostic for a set of (not-necessarily independent) MCMC samplers. It combines the estimate error-bounding approach of Raftery and Lewis with the between chain variance verses within chain variance approach of Gelman and Rubin.

For a set of exchangeable[^1] MCMC simulations on the same data and model mcgibbsit computes:

[^1]: MCMC Simulations with the same data, prior, and posterior model.

minimum run length: $N_{min}$
required burn in length: $M$
total run length: $N$
run length inflation due to auto-correlation: $I$
run length inflation due to between-chain correlation: $R$

These simulations need not be independent, such as those generated by the Normal Kernel Coupler adaptive CMC method (see @warnes2000nkc or @warnes2001tr395).

For

The normal usage is to perform an initial MCMC run of some pre-determined length (e.g. 300 iterations) for each of a set of $k$ (e.g. 20) MCMC samplers. The output from these samplers is then read in to create an mcmc.list object and mcgibbsit is run specifying the desired accuracy of estimation for quantiles of interest. This will return the minimum number of iterations to achieve the specified error bound. The set of MCMC samplers is now run so that the total number of iterations exceeds this minimum, and mcgibbsit is again called. This should continue until the number of iterations already complete is less than the minimum number computed by mcgibbsit.

If the initial number of iterations in data is too small to perform the calculations, an error message is printed indicating the minimum pilot run length.

Example

This basic example constructs a dummy set of files from an imaginary MCMC sampler and shows the results of running mcgibbsit with the default settings.

# Define a function to generate the output of our imaginary MCMC sampler
gen_samples <- function(run_id, nsamples=200)
{
  x <- matrix(nrow = nsamples+1, ncol=4)
  colnames(x) <- c("alpha","beta","gamma", "nu")

  x[,"alpha"] <- exp(rnorm (nsamples+1, mean=0.025, sd=0.025))
  x[,"beta"]  <- rnorm (nsamples+1, mean=53,    sd=14)
  x[,"gamma"] <- rbinom(nsamples+1, 20,         p=0.15) + 1
  x[,"nu"]    <- rnorm (nsamples+1, mean=x[,"alpha"] * x[,"beta"], sd=1/x[,"gamma"])
#'
  # induce serial correlation of 0.25
  x <- 0.75 * x[2:(nsamples+1),] + 0.25 * x[1:nsamples,]

  # induce ~50% acceptance rate
  accept <- runif(nsamples) > 0.50
  for(i in 2:nsamples)
    if(!accept[i]) x[i,] <- x[i-1,]

  write.table(
    x,
    file = file.path(
      tmpdir,
      paste("mcmc", run_id, "csv", sep=".")
      ),
    sep = ",",
    row.names = FALSE
  )
}

First, we'll generate and load only a 3 runs of length 200:

# Generate and load 3 runs 
for(i in 1:3)
  gen_samples(i, 200)

mcmc.3 <- read.mcmc(
  3, 
  file.path(tmpdir, "mcmc.#.csv"), 
  sep=",",
  col.names=c("alpha","beta","gamma", "nu")
  )

# Trace and Density Plots
plot(mcmc.3)

Now run mcgibbsit to determine the necessary total number of MCMC samples to to provide accurate 95% posterior confidence region estimates for all four of the parameters:

# And check the necessary run length 
mcg.3 <- mcgibbsit(mcmc.3)
print(mcg.3)

The results from mcgibbsit indicate that the required number of samples is r nformat(max(mcg.3$resmatrix[,"Total"])), which is less than we've generated so far.

Lets generate 7 more runs, each of length 200, for a total of 2,000 samples:

# Generate and load 7 more runs 
for(i in 3 + (1:7))
  gen_samples(i, 200)

mcmc.10 <- read.mcmc(
  10, 
  file.path(tmpdir, "mcmc.#.csv"), 
  sep=",",
  col.names=c("alpha","beta","gamma", "nu")
  )

# Trace and Density Plots
plot(mcmc.10)

Now run mcgibbsit to determine the necessary number of MCMC samples:

# And check the necessary run length 
mcg.10 <- mcgibbsit(mcmc.10)
print(mcg.10)

mcgibbsit now estimates that a total of required number of samples is r nformat(max(mcg.10$resmatrix[,"Total"]))[^2] Since we we have already generated r nformat(mcg.10$nchains * mcg.10$len) samples, we do not need to perform any additional runs.

[^2]: This is slightly fewer than before because the the larger number of samples allowed more accurate estimates of the variances and correlations.

We can now calculate the posterior confidence regions for each of the parameters.

summary(mcmc.10)

Any scripts or data that you put into this service are public.

mcgibbsit documentation built on Sept. 25, 2023, 5:06 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mcgibbsit
Warnes and Raftery's 'MCGibbsit' MCMC Run Length and Convergence Diagnostic

mcgibbsit Example
In mcgibbsit: Warnes and Raftery's 'MCGibbsit' MCMC Run Length and Convergence Diagnostic

Example

Try the mcgibbsit package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

mcgibbsit Warnes and Raftery's 'MCGibbsit' MCMC Run Length and Convergence Diagnostic

mcgibbsit Example In mcgibbsit: Warnes and Raftery's 'MCGibbsit' MCMC Run Length and Convergence Diagnostic

Example

Try the mcgibbsit package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

mcgibbsit
Warnes and Raftery's 'MCGibbsit' MCMC Run Length and Convergence Diagnostic

mcgibbsit Example
In mcgibbsit: Warnes and Raftery's 'MCGibbsit' MCMC Run Length and Convergence Diagnostic