In queelius/series_system_estimation_masked_data: Estimating Reliability of Components in Series from Masked Data with Relaxed Candidate Set Models

knitr::opts_chunk$set(
  collapse = TRUE,
  echo = TRUE,
  comment = "#>")

library(algebraic.mle)
library(md.tools)
library(tidyverse)
library(devtools)
options(digits=4)

Introduction

The R package ? is a framework for estimating the parameters of latent component lifetimes from masked data in a series system. The masked data comes from two sources:

(1) The system might be right-censored. (2) The cause of failure of the failed component might be masked.

Statistical model

In this study, the system is a series system with $m$ components. The true DGP for the system lifetime is in the Weibull series system family, i.e., the component lifetimes are Weibull and independently distributed and we denote the true parameter value by $\theta$.

The principle object of study is $\theta$, which in the case of the Weibull series system family consists of $2 m$ parameters: $$ \theta = (k_1, \lambda_1, \ldots, k_m, \lambda_m)'. $$

We are interested in estimating the $\theta$ from masked data. The masking comes in two independent forms:

Censored system failure times, e.g., right-censoring. The system failure time is the minimum of the component lifetimes, and it is right-censored if the system failure time does not occur during the observation period, $$ T_i = \min{\tau_i, T_{i 1}, \ldots, T_{i m}}, $$ where $\tau_i$ is the right-censoring time for the $i$\textsuperscript{th} observation and $T_{i 1},\ldots,T_{i m}$ are the component lifetimes for the $i$th system.
The cause of failure, the failed component, is masked. This masking comes in the form of a candidate set $\mathcal{C}_i$ that, on average, conveys information about the component cause of failure.

The candidate set $\mathcal{C}_i$ is a random variable that is a subset of ${1,\ldots,m}$. The true DGP for the candidate set model has a general form that may be denoted by $$ \Pr{\mathcal{C}_i=c_i|T_1=j,\ldots,T_m,\theta,\text{other factors}}. $$

This is a pretty complicated looking model, and we are not even interested in the DGP for candidate sets, except to the extent that it affects the sampling distribution of the MLE for $\theta$.

In theory, given some candidate set model, we could construct a joint likelihood function for the full model and jointly estimate the parameters of both the candidate set model and $\theta$. In practice, however, this could be a very challenging task unless we make some simplifying assumptions about the DGP for candidate sets.

Candidate set models

In every model we consider, we assume that the candidate set $\mathcal{C}i$ is only a function of the component lifetimes $T{i 1},\ldots,T_{i m}$, $\theta$, and the right-censoring time $\tau_i$. That is, the candidate set $\mathcal{C}_i$ is independent of any other factors (or held constant for the duration of the experiment), like ambient temperature, and these factors also have a neglible effect on the series system lifetime and thus we can ignore them.

Reduced likelihood model

In the Bernoulli candidate set model, we make the following assumptions about how candidate sets are generated:

$C_1$: The index of the failed component is in the candidate set, i.e., $\Pr{K_i \in \mathcal{C}i} = 1$, where $K_i = \arg\min_j { T{i j} : j = 1,\ldots,m}$.
$C_2$: The probability of $C_i$ given $K_i$ and $T_i$ is equally probable when the failed component varies over the components in the candidate set, i.e., $\Pr{\mathcal{C}_i=c_i|K_i=j,T_i=t_i,\theta} = \Pr{C_i=c_i|K_i=j',T_i=t_i}$ for any $j,j' \in c_i$.
$C_3$: The masking probabilities are conditionally independent of $\theta$ given $K_i$ and $T_i$, i.e., $\Pr{\mathcal{C}_i=c_i|K_i=j,T_i=t_i}$ is not a function of $\theta$.

Using these simplifying assumptions, we can arrive at a reduced likelihood function that only depends on $\theta$ and the observed data and as long as our candidate set satisfies conditions $C_1$, $C_2$, and $C_3$, our reduced likelihood function obtains the same MLEs as the full likelihood function.

We see that $$ \Pr{\mathcal{C}i=c_i,|K_i=j,T_i=t_i} = g(c_i,t_i), $$ since the probability cannot depend on $j$ by condition $C_2$ and cannot depend on $\theta$ by condition $C_3$. Thus, we can write the likelihood function as $$ L(\theta) = \prod{i=1}^n f_{T_i}(t_i|\theta) g(c_i,t_i). $$

We show that $g(c_i,t_i)$ is proportional to $$ g(c_i,t_i) \propto \sum_{j \in c_i} f_j(t_i|\theta_j) \prod_{l=j,l \neq j}^m R_l(t_i|\theta_l), $$ and thus

Note, however, that different ways in which the conditions are met will yield MLEs with different sampling distributions, e.g., more or less efficient estimators.

Bernoulli candidate set model #1

This is a special case of the reduced likelihood model. In this model, we satisfy conditions $C_1$, $C_2$, and $C_3$, but we include each of the non-failed components with a fixed probability $p$, $0 < p < 1$.

In the simplest case, $p = 0.5$, and candidate set $c_i$ has a probability given by $$ \Pr{\mathcal{C}_i=c_i|K_i=j,T_i=t_i} = \begin{cases} (1/2)^{m-1} & \text{if $j \in c_i$ and $c_i \subseteq {1,\ldots,m}$} \ 0 & \text{if $j \notin c_i$}. \end{cases} $$ \begin{proof} Since there are $m-1$ non-failed components (the failed component $j$ is in $c_i$ with probability $1$), there are $2^(m-1)$ possible candidate sets (the size of the power set of the non-failed component indexes). Each of these candidate sets has equal probability of occurring, and thus the probability of any particular candidate set is $1/2^(m-1)$. \end{proof}

Bernoulli candidate set model #2

Now, we remove condition $C_2$. We still assume conditions $C_1$ and $C_3$, but we allow $C_i$ to depend on the failed component $K_i$, i.e., $$ \Pr{\mathcal{C}_i=c_i|K_i=j,T_i=t_i,\theta} \neq \Pr{C_i=c_i|K_i=j',T_i=t_i} $$ for $j,j' \in c_i$.

In this case, we can write the likelihood function as $$ L(\theta) = \prod_{i=1}^n f_{T_i}(t_i|\theta) \prod_{j=1}^m \Pr{K_i=j|T_i=t_i} \prod_{c_i \in \mathcal{C}_i} g(c_i,t_i,j). $$

Weibull series system

Suppose a Weibull series system with $m=5$ components is parameterized by the following R code:

theta <- c(1,    1, # component 1's shape and scale parameters
           1.1,  1, # component 2's shape and scale parameters
           0.95, 1, # component 3's shape and scale parameters
           1.15, 1, # component 4's shape and scale parameters
           1.1,  1) # component 5's shape and scale parameters
m <- 5

So, in our study, $\theta = (r theta)'$. The component assigned to index $2$ has a Weibull distributed lifetime parameterized by $\theta_2 = (r theta[3] r theta[4])$.

Let's simulate generating the lifetimes of the $m = r m$ components for this series system:

set.seed(7231) # set seed for reproducibility
n <- 75
comp_times <- matrix(nrow=n,ncol=m)
for (j in 1:m)
    comp_times[,j] <- rweibull(n,theta[2*j-1],theta[2*j])
comp_times <- md_encode_matrix(comp_times,"t")
print(comp_times,n=4)

Next, we use the function md_series_lifetime_right_censoring to decorate the masked data with the right-censor-censoring time $\tau_1 = \tau_2 = \cdots = tau_n = 3$:

tau <- 3
data <- comp_times %>% md_series_lifetime_right_censoring(tau)
print(data,n=4,drop_latent=TRUE)

Masked component cause of failure (candidate sets)

We simulate candidate sets using the Bernoulli candidate model with an appropriate set of parameters to satisfy conditions $C_1$, $C_2$, and $C_3$:

p <- .3
data <- data %>% md_bernoulli_cand_C1_C2_C3(p)
print(data[,paste0("q",1:m)],n=4)

Now, to generate candidate sets, we sample from these probabilities:

data <- data %>% md_cand_sampler()
print(md_boolean_matrix_to_charsets(data,drop_set=TRUE),drop_latent=TRUE,n=6)

We see that after dropping latent (unobserved) columns, we only have the right censoring time, right censoring indicator, and the candidate sets. (Note that this time we showed the candidate sets in a more friendly way using md_boolean_matrix_to_charsets.)

Log-likelihood of $\theta$ given masked data

queelius/series_system_estimation_masked_data documentation built on Jan. 29, 2025, 11:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

queelius/series_system_estimation_masked_data
Estimating Reliability of Components in Series from Masked Data with Relaxed Candidate Set Models

In queelius/series_system_estimation_masked_data: Estimating Reliability of Components in Series from Masked Data with Relaxed Candidate Set Models

Introduction

Statistical model

Candidate set models

Reduced likelihood model

Bernoulli candidate set model #1

Bernoulli candidate set model #2

Weibull series system

Masked component cause of failure (candidate sets)

Log-likelihood of $\theta$ given masked data

R Package Documentation

Browse R Packages

We want your feedback!

queelius/series_system_estimation_masked_data Estimating Reliability of Components in Series from Masked Data with Relaxed Candidate Set Models

In queelius/series_system_estimation_masked_data: Estimating Reliability of Components in Series from Masked Data with Relaxed Candidate Set Models

Introduction

Statistical model

Candidate set models

Reduced likelihood model

Bernoulli candidate set model #1

Bernoulli candidate set model #2

Weibull series system

Masked component cause of failure (candidate sets)

Log-likelihood of $\theta$ given masked data

R Package Documentation

Browse R Packages

We want your feedback!

queelius/series_system_estimation_masked_data
Estimating Reliability of Components in Series from Masked Data with Relaxed Candidate Set Models