The Poisson distribution models the number of events within similar-sized time frames.
A dataset on coal mining disasters has grown very popular in the change point literature (available in boot::coal
). It contains a timestamp of each coal mining disaster from 1851 to 1962. By binning the number of events within each year (fixed time frame), we have something very Poisson-friendly:
# Number of disasters by year library(dplyr, warn.conflicts = FALSE) df = round(boot::coal) %>% group_by(date) %>% count() # See it head(df)
The number of events (n
) as a function of year (date
) is typically modeled as a change between two intercepts. This is very simple to do in mcp
:
library(mcp) options(mc.cores = 3) # Speed up sampling set.seed(42) # Make the script deterministic
model = list( n ~ 1, # intercept-only ~ 1 # intercept-only ) fit = mcp(model, data = df, family = poisson(), par_x = "date")
Let us see the two intercepts (lambda in log-units) and the change point (in years):
result = summary(fit)
We can see that the model ran well with good convergence and a large number of effective samples. At a first glance, the change point is estimated to lie between the years 1880 and 1895 (approximately).
Let us take a more direct look, using the default mcp
plot:
plot(fit)
It seems to fit the data well, but we can see that the change point probability "lumps" around particular data points. Years with a very low number of disasters abruptly increase the probability that the change to a lower disaster rate has taken place. The posterior distributions of change points regularly take these "weird" forms, i.e., not well-described by our toolbox of parameterized distributions.
We can see this more clearly if plotting the posteriors. We include a traceplot too, just to check convergence visually.
plot_pars(fit)
poisson()
defaults to link = 'log'
, meaning that we have to exponentiate the estimates to get the "raw" Poisson parameter $\lambda$. $\lambda$ has the nice property of being the mean number of events. So we see that the mean number of events in segment 1 is exp(result$mean[2])
(r exp(result$mean[2])
) and it is exp(result$mean[3])
(r exp(result$mean[3])
) for segment 2.
Default priors were used. They are normals with a standard deviation of 10. I.e. with 68% probability mass between exp(10) = 22026
and exp(-10) = 1 / 22026
:
cbind(fit$prior)
As always, the prior on the change point forces it to occur in the observed range. These priors are very vague, so update with more informed priors for your particular case, e.g.:
prior = list( cp_1 = "dnorm(1900, 30) T(MINX, 1925)" ) fit_with_prior = mcp(model, data = df, prior, poisson(), par_x = "date")
Despite the popularity of this dataset, a question rarely asked is what the evidence is that there is a change point at all. Let us fit two no-changepoint models and use approximate leave-one-out cross-validation to see how the predictive performance of the two models compare.
A flat model and a one-decay model:
# Fit an intercept-only model fit_flat = mcp(list(n ~ 1), data = df, family=poisson(), par_x = "date") fit_decay = mcp(list(n ~ 1 + date), data = df, family = poisson()) plot(fit_flat) + plot(fit_decay)
Not we compute and compare the LOO ELPDs:
fit$loo = loo(fit) fit_flat$loo = loo(fit_flat) fit_decay$loo = loo(fit_decay)
loo::loo_compare(fit$loo, fit_flat$loo, fit_decay$loo)
The change point model seems to be preferred with a ratio of around 1.7 over the decay model and 2.5 over the flat model. Another approach is to look at the model weights:
loo_list = list(fit$loo, fit_flat$loo, fit_decay$loo) loo::loo_model_weights(loo_list, method="pseudobma")
Again, unsurprisingly, the change point model is preferred and they show the same ranking as implied by loo_compare
.
Here is the JAGS code for the full model above.
fit$jags_code
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.