knitr::opts_chunk$set( collapse = TRUE, comment = "#>") library(series.system.estimation.masked.data) library(algebraic.mle) library(md.tools) library(tidyverse) library(devtools) library(printr) options(digits=3)
\renewcommand{\v}[1]{\boldsymbol{#1}}
The R package series.system.estimation.masked.data
is a framework for
estimating the parameters of latent component lifetimes from masked data
in a series system.
In a general series system, each component lifetime comes from some parametric family of distributions, e.g., exponential, Weibull, lognormal, or any other distribution for which survival and hazard functions are defined.
We consider a general series system of $m=5$ components. Components $1$ and $2$ have lifetimes in the Weibull family, components $3$ and $4$ have lifetimes in the Exponential family, and component $5$ has a lifetime in the Lognormal family.
Let's simulate a series system with masked data with a right-censoring time of $\tau_i = 3$ for $i=1,\ldots,n$ and with the Bernoulli candidate model satisfying conditions $C_1$, $C_2$, and $C_3$ with a probability $p=1/3$. Here is the R code to simulate the data:
n <- 75 m <- 3 exp.lam <- log(2)/2 wei1.shape <- 100 wei1.scale <- 2/(log(2)^(1/wei1.shape)) wei2.shape <- 200 wei2.scale <- 2/(log(2)^(1/wei2.shape)) theta <- c(wei1.shape,wei1.scale, wei2.shape,wei2.scale, exp.lam) print(theta)
So, in our study, $\theta = (r theta
)'$.
The component assigned to index $j$ has an exponentially distributed
lifetime with a failure rate $\theta_j$, e.g., $\theta_2 = r theta[2]
$ is the
failure rate of the component indexed by $2$.
Let's simulate generating the lifetimes of the $m = r m
$ components for this series
system:
set.seed(7231) # set seed for reproducibility n <- 75 comp_times <- matrix(nrow=n,ncol=m) comp_times[,1] <- rweibull(n,wei1.shape,wei1.scale) comp_times[,2] <- rweibull(n,wei2.shape,wei2.scale) comp_times[,3] <- rexp(n,exp.lam) comp_times <- md_encode_matrix(comp_times,"t") print(comp_times,n=7)
Next, we use the function md_series_lifetime_right_censoring
to decorate the
masked data with the right-censor-censoring time $\tau=1.975$:
tau <- 1.975 data <- comp_times %>% md_series_lifetime_right_censoring(tau) print(data,n=7,drop_latent=TRUE)
We simulate candidate sets using the Bernoulli candidate model with an appropriate set of parameters to satisfy conditions $C_1$, $C_2$, and $C_3$:
p <- .333 data <- data %>% md_bernoulli_cand_C1_C2_C3(p) print(data[,paste0("q",1:m)],n=7)
Now, to generate candidate sets, we sample from these probabilities:
data <- data %>% md_cand_sampler() print(md_boolean_matrix_to_charsets(data,drop_set=TRUE),drop_latent=TRUE,n=6)
The reduced log-likelihood function (the log of the kernel of the likelihood function) is given by $$ \ell(\theta) = \sum_{i=1}^n \sum_{l=1}^m \log R_j(t_i) + \sum_{i=1}^n \log \Bigl{ \sum_{j \in c_i} h_j(t_i) \Bigr}. $$
The following log-likelihood constructor, md_loglike_general_series_C1_C2_c3
,
implements the log-likelihood $\ell$ in a straightforward way, e.g., no
minimally sufficient set of statistics are derived and it accepts four arguments:
md
.We compute the log-likelihood function with:
nparams <- c(2,2,1) hs <- list() Rs <- list() hs[[1]] <- function(t,theta) theta[2]/theta[1]*(t/theta[1])^(theta[2]-1) Rs[[1]] <- function(t,theta) exp((-t/theta[1])^theta[2]) hs[[2]] <- function(t,theta) theta[2]/theta[1]*(t/theta[1])^(theta[2]-1) Rs[[2]] <- function(t,theta) exp((-t/theta[1])^theta[2]) hs[[3]] <- function(t,rate) rate Rs[[3]] <- function(t,rate) exp(-t*rate) print(hs[[1]](2,theta[1:2])) print(hs[[2]](2,theta[3:4])) print(hs[[3]](2,theta[5])) print(Rs[[1]](2,theta[1:2])) print(Rs[[2]](2,theta[3:4])) print(Rs[[3]](2,theta[5])) #knitr::knit_exit()
loglike.general <- md_loglike_general_series_C1_C2_C3(md,nparams,hs,Rs) print(loglike.general) print(loglike.general(theta)) knitr::knit_exit()
The log-likelihood function contains the maximum amount of information
about parameter $\v\theta$ given the sample of masked data md
satisfying
conditions $C_1$, $C_2$, and $C_3$.
With the log-likelihood, we may estimate $\theta$ with $\hat\theta$ by solving
$$
\hat{\v\theta} = \operatorname{argmax}{\v\theta \in \Omega} \ell(\theta),
$$
i.e., finding the point that maximizes the log-likelihood on
the observed sample md
.
This is known as maximum likelihood estimation (MLE).
We typically solve for the MLE by solving
$$
\nabla \ell|{\v\theta=\hat{\v\theta}} = \v{0}.
$$
We use the iterative method known as the Newton-Raphson to solve this, which
has the updating equation
$$
\v\theta^{(n+1)} = \v\theta^n + \alpha_n \nabla \ell(\v\theta^n),
$$
where $\alpha_n$ is chosen to approximately maximize $\ell(\theta^{(n+1))}$ by
using backtracking line search.
We use the function mle_newton_raphson
provided by the R package algebraic.mle
with the appropriate arguments.
We find $\hat{\v\theta}$ by running the following R code:
mle <- mle_newton_raphson(l=loglike.general,theta0=theta)
The function md_newton_raphson
returns an mle
object, which
has various methods implemented for it, e.g., confint
(computes the
estimator's confidence interval).
We use the summary
method, which takes an mle
object and prints out a
summary of its statistics:
summary(mle)
We let $\hat{\v\theta}$ be given by the point
method:
point(mle)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.