knitr::opts_chunk$set( collapse = TRUE, screenshot.force = FALSE, comment = "#>" ) library(weibulltools)
In this vignette two methods for the separation of mixture models are presented. A mixture model can be assumed, if the points in a probability plot show one or more changes in slope, depict one or several saddle points or follow an S-shape. A mixed distribution often represents the combination of multiple failure modes and thus must be splitted in its components to get reasonable results in further analyses.
Segmented regression aims to detect breakpoints in the sample data from whom a split in subgroups can be made. The EM-Algorithm is a computation-intensive method that iteratively tries to maximize a likelihood function, which is weighted by the posterior probability, the conditional probability that an observation belongs to subgroup k.
In the following we will focus on the application of these methods and their
visualizations using functions
plot_mod_mix(), which are implemented in
To apply the introduced methods we will use a dataset where units were passed to a high voltage stress test. hours indicates the number of hours until a failure occurs, or the number of hours until a unit was taken out of the test and has not failed. state is a flag variable and describes the condition of a unit. If a unit failed the flag is 1 and 0 otherwise. Data was taken from Reliability Analysis by Failure Mode [^note1].
[^note1]: Doganaksoy, N.; Hahn, G.; Meeker, W. Q.: Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
To get an intuition whether we can assume the presence of a mixture model, we will construct a Weibull probability plot.
# Data: hours <- c(2, 28, 67, 119, 179, 236, 282, 317, 348, 387, 3, 31, 69, 135, 191, 241, 284, 318, 348, 392, 5, 31, 76, 144, 203, 257, 286, 320, 350, 412, 8, 52, 78, 157, 211, 261, 298, 327, 360, 446, 13, 53, 104, 160, 221, 264, 303, 328, 369, 21, 64, 113, 168, 226, 278, 314, 328, 377) state <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1) id <- 1:length(hours) # Estimating failure probabilities: df_john <- johnson_method(id = id, x = hours, event = state) # Probability plot: weibull_plot <- plot_prob(x = df_john$characteristic, y = df_john$prob, event = df_john$status, id = df_john$id, distribution = "weibull", title_main = "Weibull Probability Plot", title_x = "Time in Hours", title_y = "Probability of Failure in %", title_trace = "Defect Items") weibull_plot
Since there is an obvious slope change in the Weibull probability plot of Figure 1, the appearance of a mixture model is strengthened.
weibulltools the method of segmented regression is implemented in
mixmod_regression(). If a breakpoint was detected, the failure data is
separated by that point. After breakpoint detection the function
is called inside
mixmod_regression() and is used to estimate the distribution
parameters of the subgroups.
The visualization of the obtained results is done by functions
The produced graph of
plot_prob_mix() is pretty similar to the graph provided
plot_prob(), but the difference is, that the detected subgroups are colored
plot_mod_mix() then is used to add the estimated regression line of every sub-
In the following the described procedure is expressed with code.
# Applying mixmod_regression(): mixreg_weib <- mixmod_regression(x = df_john$characteristic, y = df_john$prob, event = df_john$status, distribution = "weibull") # Using plot_prob_mix(). mix_reg_plot <- plot_prob_mix(x = hours, event = state, id = id, distribution = "weibull", mix_output = mixreg_weib, title_main = "Weibull Mixture Regression", title_x = "Time in Hours", title_y = "Probability of Failure", title_trace = "Subgroup") mix_reg_plot
# Using plot_mod_mix() to visualize regression lines of subgroups: mix_reg_lines <- plot_mod_mix(mix_reg_plot, x = hours, event = state, mix_output = mixreg_weib, distribution = "weibull", title_trace = "Fitted Line") mix_reg_lines
Without specifying the number of mixed components (k) this method has splitted the data in two groups. This can bee seen in Figure 2 and Figure 3.
To sum up, an upside of this function is that one does not have to specify the number of mixing components, since segmentation happens in an automated fashion. Nevertheless the intention of this function is to give a hint for the existence of a mixture model. An in-depth analysis should be done afterwards.
The EM-Algorithm can be applied through the usage of the function
In comparison to
mixmod_regression() one has to specify k, the number of
The obtained results can be visualized by functions
# Applying mixmod_regression(): mixem_weib <- mixmod_em(x = hours, event = state, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 150) # Using plot_prob_mix(): mix_em_plot <- plot_prob_mix(x = hours, event = state, id = id, distribution = "weibull", mix_output = mixem_weib, title_main = "Weibull Mixture EM", title_x = "Time in Hours", title_y = "Probability of Failure", title_trace = "Subgroup") mix_em_plot
# Using plot_mod_mix() to visualize regression lines of subgroups: mix_em_lines <- plot_mod_mix(mix_em_plot, x = hours, event = state, mix_output = mixem_weib, distribution = "weibull", title_trace = "Fitted Line") mix_em_lines
In comparison to
mixmod_regression() the EM-Algorithm can also assign censored
items to a specific subgroup. Hence, an individual analysis of the mixing components,
depicted in Figure 4 and Figure 5, is possible.
In conclusion an analysis of a mixture model using
mixmod_em() is statistically
founded. A drawback of this function is, that the identification of the number
of subgroups can not be determined automatically.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.