knitr::opts_chunk$set(
  collapse = TRUE,
  echo = TRUE,
  comment = "#>")

library(algebraic.mle)
library(md.tools)
library(tidyverse)
library(devtools)
options(digits=4)

Introduction

The R package ? is a framework for estimating the parameters of latent component lifetimes from masked data in a series system. The masked data comes from two sources:

(1) The system might be right-censored. (2) The cause of failure of the failed component might be masked.

Statistical model

In this study, the system is a series system with $m$ components. The true DGP for the system lifetime is in the Weibull series system family, i.e., the component lifetimes are Weibull and independently distributed and we denote the true parameter value by $\theta$.

The principle object of study is $\theta$, which in the case of the Weibull series system family consists of $2 m$ parameters: $$ \theta = (k_1, \lambda_1, \ldots, k_m, \lambda_m)'. $$

We are interested in estimating the $\theta$ from masked data. The masking comes in two independent forms:

The candidate set $\mathcal{C}_i$ is a random variable that is a subset of ${1,\ldots,m}$. The true DGP for the candidate set model has a general form that may be denoted by $$ \Pr{\mathcal{C}_i=c_i|T_1=j,\ldots,T_m,\theta,\text{other factors}}. $$

This is a pretty complicated looking model, and we are not even interested in the DGP for candidate sets, except to the extent that it affects the sampling distribution of the MLE for $\theta$.

In theory, given some candidate set model, we could construct a joint likelihood function for the full model and jointly estimate the parameters of both the candidate set model and $\theta$. In practice, however, this could be a very challenging task unless we make some simplifying assumptions about the DGP for candidate sets.

Candidate set models

In every model we consider, we assume that the candidate set $\mathcal{C}i$ is only a function of the component lifetimes $T{i 1},\ldots,T_{i m}$, $\theta$, and the right-censoring time $\tau_i$. That is, the candidate set $\mathcal{C}_i$ is independent of any other factors (or held constant for the duration of the experiment), like ambient temperature, and these factors also have a neglible effect on the series system lifetime and thus we can ignore them.

Reduced likelihood model

In the Bernoulli candidate set model, we make the following assumptions about how candidate sets are generated:

Using these simplifying assumptions, we can arrive at a reduced likelihood function that only depends on $\theta$ and the observed data and as long as our candidate set satisfies conditions $C_1$, $C_2$, and $C_3$, our reduced likelihood function obtains the same MLEs as the full likelihood function.

We see that $$ \Pr{\mathcal{C}i=c_i,|K_i=j,T_i=t_i} = g(c_i,t_i), $$ since the probability cannot depend on $j$ by condition $C_2$ and cannot depend on $\theta$ by condition $C_3$. Thus, we can write the likelihood function as $$ L(\theta) = \prod{i=1}^n f_{T_i}(t_i|\theta) g(c_i,t_i). $$

We show that $g(c_i,t_i)$ is proportional to $$ g(c_i,t_i) \propto \sum_{j \in c_i} f_j(t_i|\theta_j) \prod_{l=j,l \neq j}^m R_l(t_i|\theta_l), $$ and thus

Note, however, that different ways in which the conditions are met will yield MLEs with different sampling distributions, e.g., more or less efficient estimators.

Bernoulli candidate set model #1

This is a special case of the reduced likelihood model. In this model, we satisfy conditions $C_1$, $C_2$, and $C_3$, but we include each of the non-failed components with a fixed probability $p$, $0 < p < 1$.

In the simplest case, $p = 0.5$, and candidate set $c_i$ has a probability given by $$ \Pr{\mathcal{C}_i=c_i|K_i=j,T_i=t_i} = \begin{cases} (1/2)^{m-1} & \text{if $j \in c_i$ and $c_i \subseteq {1,\ldots,m}$} \ 0 & \text{if $j \notin c_i$}. \end{cases} $$ \begin{proof} Since there are $m-1$ non-failed components (the failed component $j$ is in $c_i$ with probability $1$), there are $2^(m-1)$ possible candidate sets (the size of the power set of the non-failed component indexes). Each of these candidate sets has equal probability of occurring, and thus the probability of any particular candidate set is $1/2^(m-1)$. \end{proof}

Bernoulli candidate set model #2

Now, we remove condition $C_2$. We still assume conditions $C_1$ and $C_3$, but we allow $C_i$ to depend on the failed component $K_i$, i.e., $$ \Pr{\mathcal{C}_i=c_i|K_i=j,T_i=t_i,\theta} \neq \Pr{C_i=c_i|K_i=j',T_i=t_i} $$ for $j,j' \in c_i$.

In this case, we can write the likelihood function as $$ L(\theta) = \prod_{i=1}^n f_{T_i}(t_i|\theta) \prod_{j=1}^m \Pr{K_i=j|T_i=t_i} \prod_{c_i \in \mathcal{C}_i} g(c_i,t_i,j). $$

Weibull series system

Suppose a Weibull series system with $m=5$ components is parameterized by the following R code:

theta <- c(1,    1, # component 1's shape and scale parameters
           1.1,  1, # component 2's shape and scale parameters
           0.95, 1, # component 3's shape and scale parameters
           1.15, 1, # component 4's shape and scale parameters
           1.1,  1) # component 5's shape and scale parameters
m <- 5

So, in our study, $\theta = (r theta)'$. The component assigned to index $2$ has a Weibull distributed lifetime parameterized by $\theta_2 = (r theta[3] r theta[4])$.

Let's simulate generating the lifetimes of the $m = r m$ components for this series system:

set.seed(7231) # set seed for reproducibility
n <- 75
comp_times <- matrix(nrow=n,ncol=m)
for (j in 1:m)
    comp_times[,j] <- rweibull(n,theta[2*j-1],theta[2*j])
comp_times <- md_encode_matrix(comp_times,"t")
print(comp_times,n=4)

Next, we use the function md_series_lifetime_right_censoring to decorate the masked data with the right-censor-censoring time $\tau_1 = \tau_2 = \cdots = tau_n = 3$:

tau <- 3
data <- comp_times %>% md_series_lifetime_right_censoring(tau)
print(data,n=4,drop_latent=TRUE)

Masked component cause of failure (candidate sets)

We simulate candidate sets using the Bernoulli candidate model with an appropriate set of parameters to satisfy conditions $C_1$, $C_2$, and $C_3$:

p <- .3
data <- data %>% md_bernoulli_cand_C1_C2_C3(p)
print(data[,paste0("q",1:m)],n=4)

Now, to generate candidate sets, we sample from these probabilities:

data <- data %>% md_cand_sampler()
print(md_boolean_matrix_to_charsets(data,drop_set=TRUE),drop_latent=TRUE,n=6)

We see that after dropping latent (unobserved) columns, we only have the right censoring time, right censoring indicator, and the candidate sets. (Note that this time we showed the candidate sets in a more friendly way using md_boolean_matrix_to_charsets.)

Log-likelihood of $\theta$ given masked data



queelius/series_system_estimation_masked_data documentation built on Jan. 29, 2025, 11:08 a.m.