knitr::opts_chunk$set(echo = TRUE, fig.width = 5) library(tidyverse) library(rcosmosis)
The function read.emcee
is the way to read the CosmoSIS output
generated by the Emcee sampler.
The data frame contains a column for each parameter being varied by the sampler,
and also containing a column walker
, which indicates the identify of the walker that generated
the sample,
and a column sample
, which indicates a sample number within the series from that walker.
root <- rprojroot::is_r_package samples <- read.emcee(root$find_file("inst/extdata/chain_emcee_1024.txt.xz")) head(samples) |> knitr::kable()
nwalkers <- max(samples$walker) nsamples_per_walker <- max(samples$sample) nsamples <- nrow(samples)
We have r nwalkers
walkers, each with
r nsamples_per_walker
samples,
for a total of r nsamples
samples.
nburn <- 100 nthin <- 10
The series of samples takes some time to achieve convergence; this period is often called the burn-in period. One way to evaluate convergence of the series is to look at how it progresses as a function of sample number; this is often called a trace. Will will do this for only a few of the walkers because using the full set makes the plots too cluttered. We note that the walkers are not independent, so care is necessary in interpreting the results.
One thing we want is see is mixing. It is important that the different walkers do not produce traces that stay separated; they should "forget" their initial values, and produce traces that eventually are well-mixed.
Since there are too many walkers to visualize this, we look only at the first 8 walkers.
We also thin the samples by only plotting one in r nthin
.
thinned <- filter(samples, walker %in% 1:8, sample %% nthin == 0) alpha <- 0.3 ggplot(thinned, aes(sample, omega_m, color = factor(walker))) + geom_path(alpha = alpha ,show.legend = FALSE) + ylab(latex2exp::TeX("$\\Omega_m$"))
ggplot(thinned, aes(sample, sigma8_input, color = factor(walker))) + geom_path(alpha = alpha, show.legend = FALSE) + ylab(latex2exp::TeX("$\\sigma_8$"))
ggplot(thinned, aes(sample, concentration, color = factor(walker))) + geom_path(alpha = alpha, show.legend = FALSE)
The chain seems to have converged for $\Omega_m$ and for $\sigma_8$, but for
concetration
it is difficult to tell.
For all our analysis of posterior distributions, we'll drop the first r nburn
elements of each chain.
To make our plotting easier, we will re-organize the data:
x <- samples |> pivot_longer(-c(sample,walker), names_to = "param", values_to = "val")
We use a combined histogram and kernel density plot to show the 1-d marginal distributions.
x |> filter(sample > nburn) |> ggplot(aes(val)) + geom_histogram(aes(y = ..density..), bins = 50) + geom_density(color = "red") + facet_wrap(vars(param), scales = "free", ncol = 1)
The posterior for concentration
has a long high-end tail, and so a log scale in x can be useful.
x |> filter(param == "concentration", sample > nburn) |> ggplot(aes(val)) + geom_histogram(aes(y = ..density..), bins = 50) + geom_density(color = "red") + xlab("concentration") + scale_x_log10()
Clearly the posterior density for concentration is strongly influenced at the upper end by the prior density chosen for the parameter, which was flat on the range [1, 10]. This parameter is not strongly constrained by these data.
We are using hexbin plots to show 2-d posterior densities because this produces few, and easy to understand, artifacts in the plotting. We use a log scale for the density because of wide range of density. We also plot the contours produced by a 2D kernel density estimator, for comparison with the hexbin results. The contours show the regions of smallest area containing 68.2%, 95.4%, and 99.7% of the total probability (1, 2 and 3 sigma contours, if the posterior density were Gaussian).
samples |> filter(sample > nburn) |> plot_density_2d(omega_m, sigma8_input) + xlab(latex2exp::TeX("$\\Omega_m$")) + ylab(latex2exp::TeX("$\\sigma_8$"))
samples |> filter(sample > nburn) |> plot_density_2d(omega_m, concentration,) + scale_y_log10() + xlab(latex2exp::TeX("$\\Omega_m$"))
samples |> filter(sample > nburn) |> plot_density_2d(sigma8_input, concentration) + scale_y_log10() + xlab(latex2exp::TeX("$\\sigma_8$"))
To produce a table of statistical summaries, we again transform the data.
samples |> select(-c(walker, sample)) |> pivot_longer(everything(), names_to = "param", values_to = "val") |> group_by(param) |> summarize(mean = mean(val), median = median(val), sd = sd(val), mad = mad(val), "5%" = quantile(val, 0.05), "95%" = quantile(val, 0.95), .groups = "drop") |> knitr::kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.