The Poisson distribution models the number of events within similar-sized time frames.

Coal mining disasters

A dataset on coal mining disasters has grown very popular in the change point literature (available in boot::coal). It contains a timestamp of each coal mining disaster from 1851 to 1962. By binning the number of events within each year (fixed time frame), we have something very Poisson-friendly:

# Number of disasters by year
library(dplyr, warn.conflicts = FALSE)
df = round(boot::coal) %>% 
  group_by(date) %>% 
  count()

# See it
head(df)

The number of events (n) as a function of year (date) is typically modeled as a change between two intercepts. This is very simple to do in mcp:

library(mcp)
options(mc.cores = 3)  # Speed up sampling
set.seed(42)  # Make the script deterministic
model = list(
  n ~ 1,  # intercept-only
  ~ 1  # intercept-only
)

fit = mcp(model, data = df, family = poisson(), par_x = "date")

Let us see the two intercepts (lambda in log-units) and the change point (in years):

result = summary(fit)

We can see that the model ran well with good convergence and a large number of effective samples. At a first glance, the change point is estimated to lie between the years 1880 and 1895 (approximately).

Let us take a more direct look, using the default mcp plot:

plot(fit)

It seems to fit the data well, but we can see that the change point probability "lumps" around particular data points. Years with a very low number of disasters abruptly increase the probability that the change to a lower disaster rate has taken place. The posterior distributions of change points regularly take these "weird" forms, i.e., not well-described by our toolbox of parameterized distributions.

We can see this more clearly if plotting the posteriors. We include a traceplot too, just to check convergence visually.

plot_pars(fit)

Priors

poisson() defaults to link = 'log', meaning that we have to exponentiate the estimates to get the "raw" Poisson parameter $\lambda$. $\lambda$ has the nice property of being the mean number of events. So we see that the mean number of events in segment 1 is exp(result$mean[2]) (r exp(result$mean[2])) and it is exp(result$mean[3]) (r exp(result$mean[3])) for segment 2.

Default priors were used. They are normals with a standard deviation of 10. I.e. with 68% probability mass between exp(10) = 22026 and exp(-10) = 1 / 22026:

cbind(fit$prior)

As always, the prior on the change point forces it to occur in the observed range. These priors are very vague, so update with more informed priors for your particular case, e.g.:

prior = list(
  cp_1 = "dnorm(1900, 30) T(MINX, 1925)"
)
fit_with_prior = mcp(model, data = df, prior, poisson(), par_x = "date")

Model comparison

Despite the popularity of this dataset, a question rarely asked is what the evidence is that there is a change point at all. Let us fit two no-changepoint models and use approximate leave-one-out cross-validation to see how the predictive performance of the two models compare.

A flat model and a one-decay model:

# Fit an intercept-only model
fit_flat = mcp(list(n ~ 1), data = df, family=poisson(), par_x = "date")
fit_decay = mcp(list(n ~ 1 + date), data = df, family = poisson())


plot(fit_flat) + plot(fit_decay)

Not we compute and compare the LOO ELPDs:

fit$loo = loo(fit)
fit_flat$loo = loo(fit_flat)
fit_decay$loo = loo(fit_decay)
loo::loo_compare(fit$loo, fit_flat$loo, fit_decay$loo)

The change point model seems to be preferred with a ratio of around 1.7 over the decay model and 2.5 over the flat model. Another approach is to look at the model weights:

loo_list = list(fit$loo, fit_flat$loo, fit_decay$loo)
loo::loo_model_weights(loo_list, method="pseudobma")

Again, unsurprisingly, the change point model is preferred and they show the same ranking as implied by loo_compare.

JAGS code

Here is the JAGS code for the full model above.

fit$jags_code


lindeloev/mcp documentation built on Oct. 2, 2024, 1:52 a.m.