knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(foocafeReliability) library(rstan)
(based on example 4.3 from Bayesian Reliability)
Failure time data is commonly used to assess component reliability. This is a record of the continuous time until a component failed. In general failure time data record "time to some event", with other examples including "time to death" in survival analysis, and "time to interrupt" in software reliability.
There are several standard failure time models and here we consider the exponential, Weibull, and lognormal failure time models to model LCD projector lamp failure times.
In business and educational settings, computer presentations use liquid crystal display (LCD) projectors. The most common failure mode of these projectors is the failure of the lamp. Many manufacturers include the "expected" lamp life in their technical specification documents, and one manufacturer claims that users can expect 1,500 hours of projection time from each lamp used under "normal operating conditions". To test this claim, a large private university placed identical lamps in three projector models for a total of 31 projectors. The university staff recorded the number of projection hours (as measured by the projector) when each lamp burned out. This dataset are the results from that experiment.
Exponential model
Assuming a constant failure rate, we first consider modelling the lamp failure times by an exponential distribution with rate parameter $\lambda$ and use the conjugate prior distribution for $\lambda$, $$ \lambda \sim Gamma(\alpha, \beta) $$ One way to choose the values for $\alpha$ and $\beta$ is to use the manufacturers best guess for the Mean Time To Failure, or MTTF $(1/\lambda)$ and use a large standard deviation around this specification. We take the manufacturers statement of "about 1500 hours" of lamp life as a prior mean lamp life of 1500 hours, and interpret "about" as not being very certain. Using moment matching (see Hamada et al. p93 for details), we obtain suitable values for alpha and beta; $\alpha \approx 2.5$ and $\beta \approx 2350$.
We now build the model using the rstan package. Lambda is the key parameter in this model. MTTF is simply a transformation of lambda. The variable 'lambda_prior' is defined in order to the posterior for lambda later on.
```{R, class.source='fold-show'}
modelString <- "
data {
int
parameters {
real
real<lower=0> lambda_prior;
}
transformed parameters { real MTTF;
MTTF = 1 / lambda;
}
model { projection_hours ~ exponential(lambda);
lambda ~ gamma(2.5, 2350); lambda_prior ~ gamma(2.5, 2350);
} "
lcd_exponential_model <- rstan::stan_model(model_code = modelString)
Then we fit the model to the LCD failures data. ```{R, class.source='fold-show'} lcd_fail_list <- list(Nobs = NROW(lcd_projector_failures$projection_hours), projection_hours = lcd_projector_failures$projection_hours ) lcd_exponential_output <- rstan::sampling(lcd_exponential_model, data = lcd_fail_list, chains = 4, iter = 4000, control = list(adapt_delta = 0.9))
The posterior distributions can be displayed using stan_hist ```{R, class.source='fold-show'}
stan_hist(lcd_exponential_output, c("lambda", "lambda_prior", "MTTF"))
Comparing the prior (dashed line) with the posterior (solid line) ```{R, class.source='fold-show'} lcd_exponential_output_vals <- extract(lcd_exponential_output) data <- data.frame(lcd_exponential_output_vals["lambda"], lcd_exponential_output_vals["lambda_prior"], lcd_exponential_output_vals["MTTF"]) ggplot(data = data) + geom_density(aes(lambda)) + geom_density(aes(lambda_prior), linetype = "dashed")
```{R, class.source='fold-show'} ggplot(data = data) + geom_density(aes(MTTF))
```{R, class.source='fold-show'} reliabilityList <- list() for (ii in 1:NROW(lcd_exponential_output_vals$lambda)){ thisLambda <- lcd_exponential_output_vals$lambda[ii] get_reliability <- function(thisLambda){ f1 <- function(t){ exp(-thisLambda * t) } return(f1) } reliability <- get_reliability(thisLambda) reliabilityDF <- data.frame("t" = t, "reliability" = numeric(NROW(t))) reliabilityDF$reliability <- reliability(t) reliabilityList[[ii]] <- reliabilityDF } temp <- lapply(reliabilityList, '[', 2) %>% dplyr::bind_cols() data <- data.frame(t = t, meanVal = apply(temp, 1, mean), LQ = apply(temp, 1, function(x) {quantile(x, 0.1)}), UQ = apply(temp, 1, function(x) {quantile(x, 0.90)})) ggplot(data = data, aes(t)) + geom_line(aes(y = meanVal)) + geom_ribbon(aes(ymin = LQ, ymax = UQ), alpha = 0.05)
```{R, class.source='fold-show'} params <- data.frame(rstan::extract(lcd_exponential_output)["lambda"]) colnames(params) <- "rate"
bayesian_chi_squared_test(y = lcd_projector_failures$projection_hours, distribution_fun = pexp, params = params, data_type = "continuous")
# Weibull fit: We try fitting a Weibull model to the same data. Weibull distributions have two parameter; shape and scale. In this case, good priors for shape and scale are; gamma(1, 1) and gamma(2.5, 0.006) respectively. Using the STAN model given above for the exponential model as a guide, and the skeleton below, see if you can construct the model yourself. ```{R, eval = FALSE, class.source = 'fold-show'} modelString <- " data { int<lower=0> Nobs; vector<lower=0>[Nobs] projection_hours; } parameters { ... } model { ... } " lcd_weibull_model <- rstan::stan_model(model_code = modelString) lcd_fail_list <- list(Nobs = NROW(lcd_projector_failures$projection_hours), projection_hours = lcd_projector_failures$projection_hours ) lcd_weibull_output <- rstan::sampling(lcd_weibull_model, data = lcd_fail_list, chains = 4, iter = 4000, control = list(adapt_delta = 0.9))
modelString <- " data { int<lower=0> Nobs; vector<lower=0>[Nobs] projection_hours; } parameters { real<lower=0> shape; real<lower=0> scale; } model { target += weibull_lpdf(projection_hours | shape, scale); shape ~ gamma(1, 1); scale ~ gamma(2.5, 0.006); } " lcd_weibull_model <- rstan::stan_model(model_code = modelString) lcd_fail_list <- list(Nobs = NROW(lcd_projector_failures$projection_hours), projection_hours = lcd_projector_failures$projection_hours ) lcd_weibull_output <- rstan::sampling(lcd_weibull_model, data = lcd_fail_list, chains = 4, iter = 4000, control = list(adapt_delta = 0.9))
```{R, class.source='fold-show'}
params <- data.frame(rstan::extract(lcd_weibull_output)["shape"], rstan::extract(lcd_weibull_output)["scale"]) colnames(params) <- c("shape", "scale")
bayesian_chi_squared_test(y = lcd_projector_failures$projection_hours, distribution_fun = pweibull, params = params, data_type = "continuous")
# lognormal fit: We the try fitting a lognormal model to the same data. # Build model ```{R, class.source='fold-show'} modelString <- " data { int<lower=0> Nobs; vector<lower=0>[Nobs] projection_hours; } parameters { real mu; real sigma_2; } transformed parameters { real<lower=0> sigma; sigma = sqrt(sigma_2); } model { target += lognormal_lpdf(projection_hours | mu, sigma); mu ~ normal(6, 5); sigma_2 ~ inv_gamma(6.5, 23.5); } " lcd_lognormal_model <- rstan::stan_model(model_code = modelString) lcd_fail_list <- list(Nobs = NROW(lcd_projector_failures$projection_hours), projection_hours = lcd_projector_failures$projection_hours ) lcd_lognormal_output <- rstan::sampling(lcd_lognormal_model, data = lcd_fail_list, chains = 4, iter = 4000, control = list(adapt_delta = 0.9))
```{R, class.source='fold-show'}
params <- data.frame(rstan::extract(lcd_lognormal_output)["mu"], rstan::extract(lcd_lognormal_output)["sigma"]) colnames(params) <- c("meanlog", "sdlog")
bayesian_chi_squared_test(y = lcd_projector_failures$projection_hours, distribution_fun = plnorm, params = params, data_type = "continuous")
# Comparison between exponential, weibull, and lognormal fits According to the Chi-squared results, all three models appear to fit the data well, but how should we decide which model to use? We can use the Bayesian Information Criterion for model comparison. The BIC considers how well the model fits the data but also penalises complicated models with many parameters. ```{R, class.source='fold-show'} data <- lcd_projector_failures$projection_hours expo <- rstan::extract(lcd_exponential_output) lambda <- mean(expo$lambda) exponential_BIC <- -2*log(prod(dexp(data, lambda))) + log(NROW(data))*1 (paste("Exponential model BIC = ", exponential_BIC)) weib <- rstan::extract(lcd_weibull_output) shape <- mean(weib$shape) scale <- mean(weib$scale) weibull_BIC <- -2*log(prod(dweibull(data, shape, scale))) + log(NROW(data))*2 (paste("Weibull model BIC = ", weibull_BIC)) logn <- rstan::extract(lcd_lognormal_output) mu <- mean(logn$mu) sigma <- mean(logn$sigma) lnorm_BIC <- -2*log(prod(dlnorm(data, mu, sigma))) + log(NROW(data))*2 (paste("Lognormal model BIC = ", lnorm_BIC))
Here we see that the exponential distribution has the lowest BIC value. The exponential distribution is a special case of the Weibull distribution so the fact that it has only one parameter means it would be a simpler model in this case. The lognormal and Weibull distributions both have two parameters - comparing their respective BIC values we see that the Weibull model has a lower BIC suggesting it is a better model for the data than the lognormal model in this case.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.