knitr::opts_chunk$set(echo = TRUE, fig.width = 5) library(tidyverse) library(rcosmosis)
The function read.metropolis.hastings
is the way to read the CosmoSIS output
generated by the Metropolis-Hastings sampler.
Because the MH sampler can run multiple chains, each of which results in a separate file,
it will be common to specify a file glob to identify the set of files that are to be read.
The result is a single data frame (more specifically, a tibble) containing
one row for each sample.
The data frame contains a column for each parameter being varied by the sampler,
and also containing a column chain
, which indicates the identify of the chain that generated
the sample,
and a column sample
, which indicates a sample number within that chain.
While CosmoSIS writes chain output in uncompressed files, rcosmosis
also
supports reading compressed files. xz
compression is especially effective in
reducing the size of the chain output files.
root <- rprojroot::is_r_package samples <- read.metropolis.hastings(root$find_file("inst/extdata/metropolis-hastings/chain_metro_*.txt.xz")) head(samples) |> knitr::kable()
nchains <- max(samples$chain) nsamples.per.chain <- max(samples$sample) nsamples.total <- nrow(samples)
We have r nchains
chains, each with at most r nsamples.per.chain
samples,
for a total of r nsamples.total
samples.
nburn <- 6000 nthin <- 50
Each chain takes some time to achieve convergence; this period is often called the burn-in period.
One way to evaluate convergence of the chains is to look at how they
progress as a function of sample number;
this is often called a trace for the chain. We will do this for the
first r nburn
values from each chain.
One thing we want is see is mixing. It is important that the different chains do not produce traces that stay separated; they should "forget" their initial values, and produce traces that eventually are well-mixed.
Since there are too many chains to visualize this, we look only at the first 8
chains. We also sample the chains sparsely, showing only 1 sample in r nthin
:
alpha = 0.3 filter(samples, sample < nburn, sample %% nthin == 0, chain %in% 1:8) |> ggplot(aes(sample, omega_m, color = factor(chain))) + geom_path(alpha = alpha, show.legend = FALSE) + ylab(latex2exp::TeX("$\\Omega_m$"))
filter(samples, sample < nburn, sample %% nthin == 0, chain %in% 1:8) |> ggplot(aes(sample, sigma8_input, color = factor(chain))) + geom_path(alpha = alpha, show.legend = FALSE) + ylab(latex2exp::TeX("$\\sigma_8$"))
filter(samples, sample < nburn, sample %% nthin == 0, chain %in% 1:8) |> ggplot(aes(sample, concentration, color = factor(chain))) + geom_path(alpha = alpha, show.legend = FALSE)
We can also look at how the marginal density for each parameter converges as we go through
fixed-size batches of samples. We'll group the data by 100-sample batches (for each chain).
As above, we need to re-organize the samples
dataframe, but this time we don't thin out the samples,
and we also assign samples to batches.
batches <- samples |> #filter(sample < 3000) |> mutate(batch = sample %/% 100) |> pivot_longer(-c(sample, chain, batch), names_to = "param", values_to = "val") maxbatches <- max(batches$batch)
We have r maxbatches+1
batches.
We can then histogram the posterior densities, and observe how they change from batch to batch.
An animation would show this well, but they are prohibitively expensive to generate in vignettes.
Here, we will look at the first batch and the last batch.
filter(batches, batch == 0) |> ggplot(aes(val)) + geom_histogram(bins = 100) + geom_histogram(aes(y = ..density..), bins = 100) + facet_wrap(vars(param), scales = "free", ncol = 1) + labs(title = "Batch 0") filter(batches, batch == 100) |> ggplot(aes(val)) + geom_histogram(bins = 100) + geom_histogram(aes(y = ..density..), bins = 100) + facet_wrap(vars(param), scales = "free", ncol = 1) + labs(title = "Batch 100")
The plots for the batch 0 show that the distribution of generated samples is dominated by the priors and the initial distribution of starting points of the chains. The plots for batch 100 show that by this time in the sampling the influence of the data is seen.
There are also commonly-used diagnostic statistics to assess convergence of chains.
Several of these are implemented in R, in the package coda
.
We demonstrate some of their uses here.
coda
objectsThe package coda
makes use of the classes mcmc
and mcmc.list
.
The dataframe returned by read.metropolis.hastings
can be converted into an mcmc.list
object.
But first, let's look at a quirk of the data with which we have been working.
group_by(samples, chain) |> # group data by chain summarize(nsamples = n(), # for each chain, count number of samples .groups = "drop") |> group_by(nsamples) |> # group that data number of samples summarize(nchains = n(), # count how many groups had each number of samples .groups = "drop") |> knitr::kable()
Two of the chains have more samples than the others.
It turns out that the CosmoSIS run we have been analyzing was terminated externally, rather than running to completion.
As we have seen, the read.metropolis.hastings
function can deal with this.
The mcmc.list
object must represent chains all of the same length.
When we convert a dataframe that contains unequal numbers of samples for its chains,
all chains are truncated to the length of the shortest chain.
samples.mcmc <- mcmc.list.from.metropolis.hastings(samples)
mcmc.list
objects have many associated functions; see the help for coda
for a list of them all.
A few can be used to sanity-check the conversion.
coda::nchain(samples.mcmc) # number of chains coda::niter(samples.mcmc) # number of samples per chain coda::nvar(samples.mcmc) # number of variables being sampled
Many of the coda
functions seem to struggle with long chains.
The function window
allows us to sub-sample (or thin) the chains, without losing too much statistical power:
coda::effectiveSize(samples.mcmc) samples.thinned <- window(samples.mcmc, thin = 40) coda::effectiveSize(samples.thinned)
Plotting the thinned sample is approximately 20 times faster than plotting the full sample.
par(mar = rep(2, 4)) plot(samples.thinned)
The Gelman-Rubin statistic is a "potential scale reduction factor".
Approximate convergence is diagnosed with the upper limit is close to 1.
The function gelman.plot
shows the evolution of the statistic as a function of sample (called iteration here).
par(mar = rep(2, 4)) coda::gelman.plot(samples.thinned, transform = TRUE, autoburnin = FALSE)
For all our analysis of posterior distributions, we'll drop the first r nburn
elements of each chain.
We use a combined histogram and kernel density plot to show the 1-d marginal distributions.
To make these plots, we first transform the data from a form that has one row per sample per chain, to one that has one row per sample per chain per physical parameter. This reshaping allows us to condition the plot on the name of the physical parameter, and thus to have a single graphic to plot.
The transformation of the data is done using tidyr::pivot_longer
.
x <- samples |> filter(sample > nburn) |> pivot_longer(-c(sample,chain), names_to = "param", values_to = "val")
x |> ggplot(aes(val)) + geom_histogram(aes(y = ..density..), bins = 50) + geom_density(color = "red") + facet_wrap(vars(param), scales = "free", ncol = 1)
The posterior for concentration
has a long high-end tail, and so a log scale in x can be useful.
x |> filter(param == "concentration") |> ggplot(aes(val)) + geom_histogram(aes(y = ..density..), bins = 50) + geom_density(color = "red") + xlab("concentration") + scale_x_log10()
We can see that these data and this model has little power to constrain the parameter concentration.
We are using hexbin plots to show 2-d posterior densities because this produces few, and easy to understand, artifacts in the plotting. We use a log scale for the density because of wide range of density. We also plot the contours produced by a 2D kernel density estimator, for comparison with the hexbin results.
samples |> filter(sample > nburn) |> plot_density_2d(omega_m, sigma8_input) + labs(x=latex2exp::TeX("$\\Omega_m$"), y=latex2exp::TeX("$\\sigma_8$"))
samples |> filter(sample > nburn) |> plot_density_2d(omega_m, concentration) + scale_y_log10()
samples |> filter(sample > nburn) |> plot_density_2d(sigma8_input, concentration) + scale_y_log10()
samples |> select(-c(chain, sample)) |> pivot_longer(everything(), names_to = "param", values_to = "val") |> group_by(param) |> summarize(mean = mean(val), median = median(val), sd = sd(val), mad = mad(val), "5%" = quantile(val, 0.05), "95%" = quantile(val, 0.95), .groups = "drop") |> knitr::kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.