In a-segar/foocafeReliability: Bayesian Reliability Tools and Examples

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(foocafeReliability)
library(rstan)

(Example taken from p89 of Bayesian Reliability - Hamada et al.)

Consider data

supercomputer_failures %>%
  ggplot(aes(x = failure_count)) +
  geom_histogram(fill = "turquoise", binwidth = 1) +
  labs(x = "Number of failures in first month of operating", y = "Count")

Formulate model

Consider modelling the monthly number of failures at the Los Alamos National Laboratory Blue Mountain supercomputer components (shared memory processors or SMPs) by a Poisson distribution.

Let $ y_1,...,y_{47} $ denote the monthly number of failures recorded for the SMPs. With $ t_i = 1 $ month, we model the failure count data by the Poisson distribution as:

$$ Y_i \sim \textit{Poisson}(\lambda) = \textit{Poisson}(\lambda), i = 1,...,47 $$

where $\lambda$ is the mean monthly number of failures.

Model priors:

The supercomputer engineers expect that there should be no more than 10 failures for each component in the first month of operation. One way to represent this prior information is to assume a gamma prior distribution for lambda with a mean of five.

Build model using rstan

modelString <- "
  data {
    int <lower=0> Nobs;
    int <lower=0> count[Nobs];
  }

  parameters {
    real <lower=0> lambda;
  }

  model {
    count ~ poisson(lambda);

    lambda ~ gamma(5, 1);
  }

  generated quantities {
    //sample predicted values from the model for posterior predictive checks
    real y_rep[Nobs];

    for(n in 1:Nobs)
      y_rep[n] = poisson_rng(lambda);
  }
"

supercomputer_failures_model <- rstan::stan_model(model_code = modelString)

Fit model to data

supercomp_fail_list <- list(Nobs = NROW(supercomputer_failures$failure_count),
                          count = supercomputer_failures$failure_count
                          )

output <- rstan::sampling(supercomputer_failures_model, data = supercomp_fail_list, chains = 4, iter = 4000, control = list(adapt_delta = 0.9))

Check the fit

params <- data.frame(lambda = rstan::extract(output)["lambda"])

bayesian_chi_squared_test(y = supercomputer_failures$failure_count,
                               distribution_fun = ppois,
                               params = params,
                               data_type = "discrete")

Evaluate goodness of fit using posterior predictive checks

Posterior predictive checks are made by simulating data from the proposed model, then comparing a chosen test statistic (this can be the mean, maximum value, minimum value or whatever you choose) from those simulations to the test statistic for the actual data. If the model fits the data well we would expect the test statistic for the simulations to be close to that of the real data. Letting $T$ be the chosen test statistic, we can write $$ p = Pr(T(simulations) > T(obs)) $$ where $simulations$ is the set of simulated data, and $obs$ is the actual data. This is known as the posterior predictive p-value and the closer it lies to the value $0.5$, the more closely test statistic for our simulations matches that of the real data. For example, considering the exponential model we fitted above, we can take the mean as our test statistic, and calculate the posterior predictive p-value as follows.

output_vals <- rstan::extract(output)
y <- supercomputer_failures$failure_count
y_rep <- rstan::extract(output, "y_rep")

# Taking the mean to be the test statistic:
(ppp_mean <- sum(apply(output_vals$y_rep, 1, mean) > mean(y))/NROW(output_vals$y_rep)
)
# Taking max to be the test statistic:
(ppp_maximum <- sum(apply(output_vals$y_rep, 1, max) > max(y))/NROW(output_vals$y_rep)
)

Using bayesplot package with rstan output:

https://mc-stan.org/bayesplot/index.html

x <- list(y = supercomputer_failures$failure_count,
          yrep = rstan::extract(output, "y_rep")$y_rep)
class(x) <- "foo"
pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) {
    y <- object[["y"]]
    yrep <- object[["yrep"]]
    switch(match.arg(type),
           multiple = bayesplot::ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]),
           overlaid = bayesplot::ppc_dens_overlay(y, yrep[1:100,, drop = FALSE]))
}
bayesplot::pp_check(x, type = "overlaid")