knitr::opts_chunk$set( collapse = TRUE, screenshot.force = FALSE, comment = "#>" ) library(weibulltools)

In this vignette two methods for the separation of mixture models are presented. A mixture model can be assumed, if the points in a probability plot show one or more changes in slope, depict one or several saddle points or follow an S-shape. A mixed distribution often represents the combination of multiple failure modes and thus must be splitted in its components to get reasonable results in further analyses.

Segmented regression aims to detect breakpoints in the sample data from whom a
split in subgroups can be made. The EM-Algorithm is a computation-intensive method
that iteratively tries to maximize a likelihood function, which is weighted by
the posterior probability, the conditional probability that an observation belongs
to subgroup *k*.

In the following we will focus on the application of these methods and their
visualizations using functions `mixmod_regression()`

, `mixmod_em()`

,
`plot_prob_mix()`

and `plot_mod_mix()`

, which are implemented in `weibulltools`

.

To apply the introduced methods we will use a dataset where units were passed
to a high voltage stress test. *hours* indicates the number of hours until a failure
occurs, or the number of hours until a unit was taken out of the test and has
not failed. *state* is a flag variable and describes the condition of a unit.
If a unit failed the flag is 1 and 0 otherwise. Data was taken from
*Reliability Analysis by Failure Mode* [^note1].

[^note1]: Doganaksoy, N.; Hahn, G.; Meeker, W. Q.: *Reliability Analysis by Failure Mode*,
Quality Progress, 35(6), 47-52, 2002

To get an intuition whether we can assume the presence of a mixture model, we will construct a Weibull probability plot.

# Data: hours <- c(2, 28, 67, 119, 179, 236, 282, 317, 348, 387, 3, 31, 69, 135, 191, 241, 284, 318, 348, 392, 5, 31, 76, 144, 203, 257, 286, 320, 350, 412, 8, 52, 78, 157, 211, 261, 298, 327, 360, 446, 13, 53, 104, 160, 221, 264, 303, 328, 369, 21, 64, 113, 168, 226, 278, 314, 328, 377) state <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1) id <- 1:length(hours) # Estimating failure probabilities: df_john <- johnson_method(id = id, x = hours, event = state) # Probability plot: weibull_plot <- plot_prob(x = df_john$characteristic, y = df_john$prob, event = df_john$status, id = df_john$id, distribution = "weibull", title_main = "Weibull Probability Plot", title_x = "Time in Hours", title_y = "Probability of Failure in %", title_trace = "Defect Items") weibull_plot

Since there is an obvious slope change in the Weibull probability plot of *Figure 1*,
the appearance of a mixture model is strengthened.

`weibulltools`

In package `weibulltools`

the method of segmented regression is implemented in
function `mixmod_regression()`

. If a breakpoint was detected, the failure data is
separated by that point. After breakpoint detection the function `rank_regression()`

is called inside `mixmod_regression()`

and is used to estimate the distribution
parameters of the subgroups.

The visualization of the obtained results is done by functions `plot_prob_mix()`

and `plot_mod_mix()`

.

The produced graph of `plot_prob_mix()`

is pretty similar to the graph provided
by `plot_prob()`

, but the difference is, that the detected subgroups are colored
differently.

`plot_mod_mix()`

then is used to add the estimated regression line of every sub-
distribution.

In the following the described procedure is expressed with code.

# Applying mixmod_regression(): mixreg_weib <- mixmod_regression(x = df_john$characteristic, y = df_john$prob, event = df_john$status, distribution = "weibull") # Using plot_prob_mix(). mix_reg_plot <- plot_prob_mix(x = hours, event = state, id = id, distribution = "weibull", mix_output = mixreg_weib, title_main = "Weibull Mixture Regression", title_x = "Time in Hours", title_y = "Probability of Failure", title_trace = "Subgroup") mix_reg_plot

# Using plot_mod_mix() to visualize regression lines of subgroups: mix_reg_lines <- plot_mod_mix(mix_reg_plot, x = hours, event = state, mix_output = mixreg_weib, distribution = "weibull", title_trace = "Fitted Line") mix_reg_lines

Without specifying the number of mixed components *(k)* this method has splitted
the data in two groups. This can bee seen in *Figure 2* and *Figure 3*.

To sum up, an upside of this function is that one does not have to specify the number of
mixing components, since segmentation happens in an automated fashion. Nevertheless
the intention of this function is to give a hint for the existence of a mixture
model. An in-depth analysis should be done afterwards.

`weibulltools`

The EM-Algorithm can be applied through the usage of the function `mixmod_em()`

.
In comparison to `mixmod_regression()`

one has to specify *k*, the number of
subgroups.

The obtained results can be visualized by functions `plot_prob_mix()`

and
`plot_mod_mix()`

, too.

# Applying mixmod_regression(): mixem_weib <- mixmod_em(x = hours, event = state, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 150) # Using plot_prob_mix(): mix_em_plot <- plot_prob_mix(x = hours, event = state, id = id, distribution = "weibull", mix_output = mixem_weib, title_main = "Weibull Mixture EM", title_x = "Time in Hours", title_y = "Probability of Failure", title_trace = "Subgroup") mix_em_plot

# Using plot_mod_mix() to visualize regression lines of subgroups: mix_em_lines <- plot_mod_mix(mix_em_plot, x = hours, event = state, mix_output = mixem_weib, distribution = "weibull", title_trace = "Fitted Line") mix_em_lines

In comparison to `mixmod_regression()`

the EM-Algorithm can also assign censored
items to a specific subgroup. Hence, an individual analysis of the mixing components,
depicted in *Figure 4* and *Figure 5*, is possible.

In conclusion an analysis of a mixture model using `mixmod_em()`

is statistically
founded. A drawback of this function is, that the identification of the number
of subgroups can not be determined automatically.

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.