knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>")

library(series.system.estimation.masked.data)
library(algebraic.mle)
library(md.tools)
library(tidyverse)
library(devtools)
library(printr)

options(digits=3)

\renewcommand{\v}[1]{\boldsymbol{#1}}

Introduction

The R package series.system.estimation.masked.data is a framework for estimating the parameters of latent component lifetimes from masked data in a series system.

General series system

In a general series system, each component lifetime comes from some parametric family of distributions, e.g., exponential, Weibull, lognormal, or any other distribution for which survival and hazard functions are defined.

We consider a general series system of $m=5$ components. Components $1$ and $2$ have lifetimes in the Weibull family, components $3$ and $4$ have lifetimes in the Exponential family, and component $5$ has a lifetime in the Lognormal family.

Let's simulate a series system with masked data with a right-censoring time of $\tau_i = 3$ for $i=1,\ldots,n$ and with the Bernoulli candidate model satisfying conditions $C_1$, $C_2$, and $C_3$ with a probability $p=1/3$. Here is the R code to simulate the data:

n <- 75
m <- 3
exp.lam <- log(2)/2
wei1.shape <- 100
wei1.scale <- 2/(log(2)^(1/wei1.shape))
wei2.shape <- 200
wei2.scale <- 2/(log(2)^(1/wei2.shape))

theta <- c(wei1.shape,wei1.scale,
           wei2.shape,wei2.scale,
           exp.lam)
print(theta)

So, in our study, $\theta = (r theta)'$. The component assigned to index $j$ has an exponentially distributed lifetime with a failure rate $\theta_j$, e.g., $\theta_2 = r theta[2]$ is the failure rate of the component indexed by $2$.

Let's simulate generating the lifetimes of the $m = r m$ components for this series system:

set.seed(7231) # set seed for reproducibility
n <- 75
comp_times <- matrix(nrow=n,ncol=m)
comp_times[,1] <- rweibull(n,wei1.shape,wei1.scale)
comp_times[,2] <- rweibull(n,wei2.shape,wei2.scale)
comp_times[,3] <- rexp(n,exp.lam)
comp_times <- md_encode_matrix(comp_times,"t")
print(comp_times,n=7)

Next, we use the function md_series_lifetime_right_censoring to decorate the masked data with the right-censor-censoring time $\tau=1.975$:

tau <- 1.975
data <- comp_times %>% md_series_lifetime_right_censoring(tau)
print(data,n=7,drop_latent=TRUE)

Masked component cause of failure (candidate sets)

We simulate candidate sets using the Bernoulli candidate model with an appropriate set of parameters to satisfy conditions $C_1$, $C_2$, and $C_3$:

p <- .333
data <- data %>% md_bernoulli_cand_C1_C2_C3(p)
print(data[,paste0("q",1:m)],n=7)

Now, to generate candidate sets, we sample from these probabilities:

data <- data %>% md_cand_sampler()
print(md_boolean_matrix_to_charsets(data,drop_set=TRUE),drop_latent=TRUE,n=6)

Log-likelihood of $\theta$ given masked data

The reduced log-likelihood function (the log of the kernel of the likelihood function) is given by $$ \ell(\theta) = \sum_{i=1}^n \sum_{l=1}^m \log R_j(t_i) + \sum_{i=1}^n \log \Bigl{ \sum_{j \in c_i} h_j(t_i) \Bigr}. $$

The following log-likelihood constructor, md_loglike_general_series_C1_C2_c3, implements the log-likelihood $\ell$ in a straightforward way, e.g., no minimally sufficient set of statistics are derived and it accepts four arguments:

  1. The maked data md.
  2. A vector specifying the number of parameters for each component.
  3. A list of the reliability functions for the components.
  4. A list of the hazard functions for the components.

We compute the log-likelihood function with:

nparams <- c(2,2,1)
hs <- list()
Rs <- list()
hs[[1]] <- function(t,theta) theta[2]/theta[1]*(t/theta[1])^(theta[2]-1)
Rs[[1]] <- function(t,theta) exp((-t/theta[1])^theta[2])

hs[[2]] <- function(t,theta) theta[2]/theta[1]*(t/theta[1])^(theta[2]-1)
Rs[[2]] <- function(t,theta) exp((-t/theta[1])^theta[2])

hs[[3]] <- function(t,rate) rate
Rs[[3]] <- function(t,rate) exp(-t*rate)

print(hs[[1]](2,theta[1:2]))
print(hs[[2]](2,theta[3:4]))
print(hs[[3]](2,theta[5]))

print(Rs[[1]](2,theta[1:2]))
print(Rs[[2]](2,theta[3:4]))
print(Rs[[3]](2,theta[5]))

#knitr::knit_exit()
loglike.general <- md_loglike_general_series_C1_C2_C3(md,nparams,hs,Rs)
print(loglike.general)
print(loglike.general(theta))
knitr::knit_exit()

The log-likelihood function contains the maximum amount of information about parameter $\v\theta$ given the sample of masked data md satisfying conditions $C_1$, $C_2$, and $C_3$.

With the log-likelihood, we may estimate $\theta$ with $\hat\theta$ by solving $$ \hat{\v\theta} = \operatorname{argmax}{\v\theta \in \Omega} \ell(\theta), $$ i.e., finding the point that maximizes the log-likelihood on the observed sample md. This is known as maximum likelihood estimation (MLE). We typically solve for the MLE by solving $$ \nabla \ell|{\v\theta=\hat{\v\theta}} = \v{0}. $$ We use the iterative method known as the Newton-Raphson to solve this, which has the updating equation $$ \v\theta^{(n+1)} = \v\theta^n + \alpha_n \nabla \ell(\v\theta^n), $$ where $\alpha_n$ is chosen to approximately maximize $\ell(\theta^{(n+1))}$ by using backtracking line search.

We use the function mle_newton_raphson provided by the R package algebraic.mle with the appropriate arguments. We find $\hat{\v\theta}$ by running the following R code:

mle <- mle_newton_raphson(l=loglike.general,theta0=theta)

The function md_newton_raphson returns an mle object, which has various methods implemented for it, e.g., confint (computes the estimator's confidence interval). We use the summary method, which takes an mle object and prints out a summary of its statistics:

summary(mle)

We let $\hat{\v\theta}$ be given by the point method:

point(mle)


queelius/series_system_estimation_masked_data documentation built on Jan. 29, 2025, 11:08 a.m.