# mixmod_em: Mixture Model Estimation using EM-Algorithm In weibulltools: Statistical Methods for Life Data Analysis

## Description

This method uses the EM-Algorithm to estimate the parameters of a univariate mixture model. Until now, the mixture model can consist of k two-parametric Weibull distributions. If no mixture of k components can be estimated, the function is forced to stop and a message with instructions is given.

## Usage

 ```1 2 3``` ```mixmod_em(x, event, post = NULL, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 100L, conv_limit = 1e-06, diff_loglik = 0.5) ```

## Arguments

 `x` a numeric vector which consists of lifetime data. Lifetime data could be every characteristic influencing the reliability of a product, e.g. operating time (days/months in service), mileage (km, miles), load cycles. `event` a vector of binary data (0 or 1) indicating whether unit i is a right censored observation (= 0) or a failure (= 1). `post` a numeric matrix specifiying initial a-posteriori probabilities. If post is `NULL` (default) a-posteriori probabilities are assigned randomly using the Dirichlet distribution (`rdirichlet` from LearnBayes Package), which is the conjugate prior of a Multinomial distribution. This idea was taken from the blog post of Mr. Gelissen (linked under references). `distribution` supposed mixture model. Only `"weibull"` can be used. Other distributions have not been implemented yet. `conf_level` confidence level for the confidence intervals of the parameters of every component `k`. The default value is `conf_level = 0.95`. `k` integer of mixture components, default is 2. `method` default method is `"EM"`. Other methods have not been implemented yet. `n_iter` integer defining the maximum number of iterations. `conv_limit` numeric value defining the convergence limit. `diff_loglik` numeric value defining the maximum difference between log-likelihood values, which seems permissible. The default value is `0.5`. See Details for the usage of this argument.

## Details

In `mixmod_em` the function `mixture_em_cpp` is called. The computed posterior probabilities are then used as weights inside function `ml_estimation` to model a weighted log-likelihood. This strategy enables the computation of confidence intervals for the parameters of the separated sub-distributions, since `ml_estimation` provides a variance-covariance matrix. Using this strategy, a potential problem that can occur is, that the value of the complete log-likelihood, computed by `mixture_em_cpp`, differs considerably from the complete log-likelihood after re-estimating parameters with `ml_estimation`. If so, the estimated quantities like prior and posterior probabilities, as well as the model parameters are not reliable anymore and the function is forced to stop with the message: "Parameter estimation was not successful!" But if the log-likelihood values are close to each other, the presence of the mixture is strengthened and a reasonable fit is provided. Thus, a check of the absolute differences in the log-likelihood values is made and the critical difference has to be specified in argument `diff_loglik`.

## Value

Returns a list where the length of the list depends on the number of k subgroups. The first `k` lists have the same information as provided by ml_estimation, but the values `logL`, `aic` and `bic` are the results of a log-likelihood function, which is weighted by a-posteriori probabilities. The last list summarizes further results of the EM-Algorithm and is therefore called `em_results`. It contains the following elements:

• `a_priori` : A vector with estimated a-priori probabilities.

• `a_posteriori` : A matrix with estimated a-posteriori probabilities.

• `groups` : Numeric vector specifying the group membership of every observation.

• `logL` : The value of the complete log-likelihood.

• `aic` : Akaike Information Criterion.

• `bic` : Bayesian Information Criterion.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```# Data is taken from given reference of Doganaksoy, Hahn and Meeker: hours <- c(2, 28, 67, 119, 179, 236, 282, 317, 348, 387, 3, 31, 69, 135, 191, 241, 284, 318, 348, 392, 5, 31, 76, 144, 203, 257, 286, 320, 350, 412, 8, 52, 78, 157, 211, 261, 298, 327, 360, 446, 13, 53, 104, 160, 221, 264, 303, 328, 369, 21, 64, 113, 168, 226, 278, 314, 328, 377) state <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1) mix_mod_em <- mixmod_em(x = hours, event = state, distribution = "weibull", conf_level = 0.95, k = 2, method = "EM", n_iter = 150) ```

weibulltools documentation built on May 2, 2019, 11:01 a.m.