knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(foocafeReliability) library(rstan)
(Example taken from p89 of Bayesian Reliability - Hamada et al.)
supercomputer_failures %>% ggplot(aes(x = failure_count)) + geom_histogram(fill = "turquoise", binwidth = 1) + labs(x = "Number of failures in first month of operating", y = "Count")
Consider modelling the monthly number of failures at the Los Alamos National Laboratory Blue Mountain supercomputer components (shared memory processors or SMPs) by a Poisson distribution.
Let $ y_1,...,y_{47} $ denote the monthly number of failures recorded for the SMPs. With $ t_i = 1 $ month, we model the failure count data by the Poisson distribution as:
$$ Y_i \sim \textit{Poisson}(\lambda) = \textit{Poisson}(\lambda), i = 1,...,47 $$
where $\lambda$ is the mean monthly number of failures.
The supercomputer engineers expect that there should be no more than 10 failures for each component in the first month of operation. One way to represent this prior information is to assume a gamma prior distribution for lambda with a mean of five.
modelString <- " data { int <lower=0> Nobs; int <lower=0> count[Nobs]; } parameters { real <lower=0> lambda; } model { count ~ poisson(lambda); lambda ~ gamma(5, 1); } generated quantities { //sample predicted values from the model for posterior predictive checks real y_rep[Nobs]; for(n in 1:Nobs) y_rep[n] = poisson_rng(lambda); } " supercomputer_failures_model <- rstan::stan_model(model_code = modelString)
supercomp_fail_list <- list(Nobs = NROW(supercomputer_failures$failure_count), count = supercomputer_failures$failure_count ) output <- rstan::sampling(supercomputer_failures_model, data = supercomp_fail_list, chains = 4, iter = 4000, control = list(adapt_delta = 0.9))
params <- data.frame(lambda = rstan::extract(output)["lambda"]) bayesian_chi_squared_test(y = supercomputer_failures$failure_count, distribution_fun = ppois, params = params, data_type = "discrete")
Posterior predictive checks are made by simulating data from the proposed model, then comparing a chosen test statistic (this can be the mean, maximum value, minimum value or whatever you choose) from those simulations to the test statistic for the actual data. If the model fits the data well we would expect the test statistic for the simulations to be close to that of the real data. Letting $T$ be the chosen test statistic, we can write $$ p = Pr(T(simulations) > T(obs)) $$ where $simulations$ is the set of simulated data, and $obs$ is the actual data. This is known as the posterior predictive p-value and the closer it lies to the value $0.5$, the more closely test statistic for our simulations matches that of the real data. For example, considering the exponential model we fitted above, we can take the mean as our test statistic, and calculate the posterior predictive p-value as follows.
output_vals <- rstan::extract(output) y <- supercomputer_failures$failure_count y_rep <- rstan::extract(output, "y_rep") # Taking the mean to be the test statistic: (ppp_mean <- sum(apply(output_vals$y_rep, 1, mean) > mean(y))/NROW(output_vals$y_rep) ) # Taking max to be the test statistic: (ppp_maximum <- sum(apply(output_vals$y_rep, 1, max) > max(y))/NROW(output_vals$y_rep) )
https://mc-stan.org/bayesplot/index.html
x <- list(y = supercomputer_failures$failure_count, yrep = rstan::extract(output, "y_rep")$y_rep) class(x) <- "foo" pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) { y <- object[["y"]] yrep <- object[["yrep"]] switch(match.arg(type), multiple = bayesplot::ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]), overlaid = bayesplot::ppc_dens_overlay(y, yrep[1:100,, drop = FALSE])) } bayesplot::pp_check(x, type = "overlaid")
```{R, eval = FALSE}
y <- supercomputer_failures$failure_count shinystan::launch_shinystan(output)
```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.