# Life Data Analysis Part III - Mixture Models" In weibulltools: Statistical Methods for Life Data Analysis

```knitr::opts_chunk\$set(
collapse = TRUE,
screenshot.force = FALSE,
comment = "#>"
)
library(weibulltools)
```

In this vignette two methods for the separation of mixture models are presented. A mixture model can be assumed, if the points in a probability plot show one or more changes in slope, depict one or several saddle points or follow an S-shape. A mixed distribution often represents the combination of multiple failure modes and thus must be splitted in its components to get reasonable results in further analyses.

Segmented regression aims to detect breakpoints in the sample data from whom a split in subgroups can be made. The EM-Algorithm is a computation-intensive method that iteratively tries to maximize a likelihood function, which is weighted by the posterior probability, the conditional probability that an observation belongs to subgroup k.

In the following we will focus on the application of these methods and their visualizations using functions `mixmod_regression()`, `mixmod_em()`, `plot_prob_mix()` and `plot_mod_mix()`, which are implemented in `weibulltools`.

## Data: Voltage Stress Test

To apply the introduced methods we will use a dataset where units were passed to a high voltage stress test. hours indicates the number of hours until a failure occurs, or the number of hours until a unit was taken out of the test and has not failed. state is a flag variable and describes the condition of a unit. If a unit failed the flag is 1 and 0 otherwise. Data was taken from Reliability Analysis by Failure Mode [^note1].

[^note1]: Doganaksoy, N.; Hahn, G.; Meeker, W. Q.: Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002

## Probability Plot for Voltage Stress Test Data

To get an intuition whether we can assume the presence of a mixture model, we will construct a Weibull probability plot.

```# Data:
hours <- c(2, 28, 67, 119, 179, 236, 282, 317, 348, 387, 3, 31, 69, 135,
191, 241, 284, 318, 348, 392, 5, 31, 76, 144, 203, 257, 286,
320, 350, 412, 8, 52, 78, 157, 211, 261, 298, 327, 360, 446,
13, 53, 104, 160, 221, 264, 303, 328, 369, 21, 64, 113, 168,
226, 278, 314, 328, 377)

state <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,
1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0,
1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1)

id <- 1:length(hours)

# Estimating failure probabilities:
df_john <- johnson_method(id = id, x = hours, event = state)

# Probability plot:
weibull_plot <- plot_prob(x = df_john\$characteristic, y = df_john\$prob,
event = df_john\$status, id = df_john\$id,
distribution = "weibull",
title_main = "Weibull Probability Plot",
title_x = "Time in Hours",
title_y = "Probability of Failure in %",
title_trace = "Defect Items")
weibull_plot
```

Since there is an obvious slope change in the Weibull probability plot of Figure 1, the appearance of a mixture model is strengthened.

## Segmented Regression with Package `weibulltools`

In package `weibulltools` the method of segmented regression is implemented in function `mixmod_regression()`. If a breakpoint was detected, the failure data is separated by that point. After breakpoint detection the function `rank_regression()` is called inside `mixmod_regression()` and is used to estimate the distribution parameters of the subgroups.
The visualization of the obtained results is done by functions `plot_prob_mix()` and `plot_mod_mix()`.
The produced graph of `plot_prob_mix()` is pretty similar to the graph provided by `plot_prob()`, but the difference is, that the detected subgroups are colored differently.
`plot_mod_mix()` then is used to add the estimated regression line of every sub- distribution.
In the following the described procedure is expressed with code.

```# Applying mixmod_regression():
mixreg_weib <- mixmod_regression(x = df_john\$characteristic, y = df_john\$prob,
event = df_john\$status, distribution = "weibull")

# Using plot_prob_mix().
mix_reg_plot <- plot_prob_mix(x = hours, event = state, id = id,
distribution = "weibull", mix_output = mixreg_weib,
title_main = "Weibull Mixture Regression", title_x = "Time in Hours",
title_y = "Probability of Failure", title_trace = "Subgroup")
mix_reg_plot
```
```# Using plot_mod_mix() to visualize regression lines of subgroups:
mix_reg_lines <- plot_mod_mix(mix_reg_plot, x = hours, event = state,
mix_output = mixreg_weib, distribution = "weibull", title_trace = "Fitted Line")
mix_reg_lines
```

Without specifying the number of mixed components (k) this method has splitted the data in two groups. This can bee seen in Figure 2 and Figure 3.
To sum up, an upside of this function is that one does not have to specify the number of mixing components, since segmentation happens in an automated fashion. Nevertheless the intention of this function is to give a hint for the existence of a mixture model. An in-depth analysis should be done afterwards.

## EM-Algorithm with Package `weibulltools`

The EM-Algorithm can be applied through the usage of the function `mixmod_em()`. In comparison to `mixmod_regression()` one has to specify k, the number of subgroups.
The obtained results can be visualized by functions `plot_prob_mix()` and `plot_mod_mix()`, too.

```# Applying mixmod_regression():
mixem_weib <- mixmod_em(x = hours, event = state, distribution = "weibull",
conf_level = 0.95, k = 2, method = "EM", n_iter = 150)

# Using plot_prob_mix():
mix_em_plot <- plot_prob_mix(x = hours, event = state, id = id,
distribution = "weibull", mix_output = mixem_weib,
title_main = "Weibull Mixture EM", title_x = "Time in Hours",
title_y = "Probability of Failure", title_trace = "Subgroup")
mix_em_plot
```
```# Using plot_mod_mix() to visualize regression lines of subgroups:
mix_em_lines <- plot_mod_mix(mix_em_plot, x = hours, event = state,
mix_output = mixem_weib, distribution = "weibull", title_trace = "Fitted Line")
mix_em_lines
```

In comparison to `mixmod_regression()` the EM-Algorithm can also assign censored items to a specific subgroup. Hence, an individual analysis of the mixing components, depicted in Figure 4 and Figure 5, is possible.
In conclusion an analysis of a mixture model using `mixmod_em()` is statistically founded. A drawback of this function is, that the identification of the number of subgroups can not be determined automatically.

## Try the weibulltools package in your browser

Any scripts or data that you put into this service are public.

weibulltools documentation built on May 2, 2019, 11:01 a.m.