Fitting dynamic frailty models with dynfrail
In dynfrail: Fitting Dynamic Frailty Models with the EM Algorithm

knitr::opts_chunk$set(message = FALSE)

Package info

This is an R package for fitting semiparametric dynamic frailty models with the EM algorithm. The hazard for individual $j$ from cluster $i$ is specified as: $$ \lambda_{ij}(t | Z_i(t)) = Z_i(t) \exp(\beta^\top x_{ij}(t)) \lambda_0(t). $$ The model used here is described in detail in Putter & van Houwelingen (2015). The distribution of $Z_i(t)$ is described by two parameters: $\theta$, that is an inverse-variability parameter of $Z_i(t)$ for a fixed $t$, and $\lambda$, that describes the autocorrelation of the process, so that for $t_1 \leq t_2$ $$ \mathrm{cor}(Z_i(t_1), Z_i(t_2)) = \exp(\lambda (t_2 - t_1)). $$

The estimation process is that for fixed $(\theta, \lambda)$ the maximized profile likelihood is calculated, i.e. maximized with respect to $(\beta, \lambda_0)$. This profile likelihood is finally maximized itself.

Installation

The development version from GitHub:

devtools::install_github("tbalan/dynfrail")

The following packages are needed to build dynfrail:

install.packages(c("RcppArmadillo", "tibble", "magrittr", "dplyr", "tidyr"))

The functioning of the package is described in the documentation of the main fitting function, dynfrail().

Features

gamma, PVF, compount Poisson, inverse Gaussian distributions
flexible adjustment of estimation parameters
semiparametric $Z(t)$ that changes values at every $t$ or piecewise constant $Z(t)$
clustered survival data & recurrent events (calendar time or gaptime) ar supported

Functions

dynfrail() has a friendly syntax very similar to the frailtyEM package: next to a formula and data argument, the distribution argument is used to specify the distribution parameters and the control parameter is used for controling the precision of the estimation.
dynfrail_prep() and dynfrail_fit() are used internally by dynfrail() but are made user-available. The first one prepares the input of dynfrail() to make it suitable for the actual EM algorithm. The second one performs one EM algorithm for fixed $(\theta, \lambda)$ to estimate the maximum ($\beta$, $\lambda_0$).

Limitations

slow even for medium sized data sets. It is recommended to start with a small number of piecewise constant intervals and/or a subset of the data
no direct standard errors for $(\theta, \lambda)$.

Analyzing the output

We take the asthma data set in the parfm package. I take a subset where each individual is censored after 3 events.

library(parfm)
library(dplyr)
data(asthma)

small_asthma <- 
  asthma %>% 
  group_by(Patid) %>% 
  mutate(linenr = 1:n()) %>% 
  filter(linenr <= 3)

This is a recurrent events data set in Andersen-Gill format with one covariate (Drug).

head(small_asthma)

This is a fit with a gamma piecewise constant frailty with 2 intervals:

library(dynfrail)
m2 <- dynfrail(Surv(Begin, End, Status) ~ Drug + cluster(Patid), data = small_asthma,
         distribution = dynfrail_dist(n_ints = 1))
m2

From the output we see that the estimated frailty variance is around 0.48, and the estimated auto correlation between 10 days is $\exp(-0.628 \times 10) \approx 0.002$.

An inverse Gaussian fit with 3 intervals for the frailty would works like this:

m3_ig <- dynfrail(Surv(Begin, End, Status) ~ Drug + cluster(Patid), data = small_asthma,
         distribution = dynfrail_dist(dist = "pvf", n_ints = 2))
m3_ig

The interpretation goes similar with that from the previous model.

The baseline hazard can be plotted like this:

with(m3_ig, plot(tev, cumsum(hazard), main = "Baseline cumulative hazard", ylab = "H0", xlab = "time"))

The empirical Bayes frailty estimates are stored in here:

m3_ig$frail_id %>% 
  select(id, tstart, tstop, frail) %>% 
  head()

Let's plot the frailty estimates of individual 1:

library(ggplot2)
m3_ig$frail_id %>% 
  filter(id == 1) %>% 
  ggplot() + geom_segment(aes(x = tstart, xend = tstop, y = frail, yend = frail)) + 
  ylim(c(0, 3)) + 
  theme_classic()

For the 3 indiviudals the estimated frailty looks like:

m3_ig$frail_id %>% 
  filter(id <= 3) %>% 
  mutate(id = as.factor(id)) %>% 
  ggplot() + geom_segment(aes(x = tstart, xend = tstop, y = frail, yend = frail, colour = id)) + 
  ylim(c(0, 3)) + 
  theme_classic()

Now for 5 piecewise-constant intervals, again inverse Gaussian frailty. This takes a while if you try to run it:

m5_ig <- dynfrail(Surv(Begin, End, Status) ~ Drug + cluster(Patid), data = small_asthma,
         distribution = dynfrail_dist(dist = "pvf", n_ints = 4))
m5_ig

Now the first 3 individuals look like this:

m5_ig$frail_id %>% 
  filter(id <= 3) %>% 
  mutate(id = as.factor(id)) %>% 
  ggplot() + geom_segment(aes(x = tstart, xend = tstop, y = frail, yend = frail, colour = id)) + 
  ylim(c(0, 3)) + 
  theme_classic()

Look at the log-likelihoods:

c(m3_ig$loglik[2], m5_ig$loglik[2])

Seems that the model with 5 piecewise constant intervals has a higher log-likelihood than the one with 3 piecewise constant intervals.

Calculating the likelihood at specific frailty distributions

We saw where the maximum likelihood in model m5 lies:

m5_ig

Say we want the likelihood for $\theta = 3$ (variance of the frailty 0.33). This can be done in two steps:

args_5 <- dynfrail_prep(formula = Surv(Begin, End, Status) ~ Drug + cluster(Patid), 
    data = small_asthma, distribution = dynfrail_dist(dist = "pvf", 
        n_ints = 4))

lik_5 <- do.call(dynfrail_fit, c(logfrailtypar = list(c(log(3), m5_ig$loglambda)), args_5))
lik_5

Of course that the likelihood is smaller than the maximum likelihood. This is useful though; for example, in this way it can be seen how the likelihood varies in $\lambda$. Furthermore, dynfrail_fit may be plugges into any maximizer in this way.

1 piecewise-constant interval

Then the result is the same as the regular shared frailty model:

m1 <- dynfrail(Surv(Begin, End, Status) ~ Drug + cluster(Patid), data = small_asthma,
         distribution = dynfrail_dist(n_ints = 0))
m1

Compare with the (gamma) shared frailty model:

library(frailtyEM)
m1_sf <- emfrail(Surv(Begin, End, Status) ~ Drug + cluster(Patid), data = small_asthma)
summary(m1_sf)

Note that in this case the likelihood is completely flat in $\lambda$! This can be checked as was shown in the previous part.

The information matrix

The information matrix is calculated with Louis' formula, for a fixed $\theta, \lambda$. Denoting all the other parameters by $\gamma$, this means: $$ I = \mathrm{E} \left[ \frac{d^2}{d \gamma d \gamma^\top} l \right] - \mathrm{E}\left[ \frac{d}{d \gamma} \frac{d}{d \gamma^\top} \right] $$ The first part is a matrix with the expectation of the second derivatives of the complete-data likelihood. That is: $$ l = \sum_k \left[ \sum_i \delta_{ki} ( \log h_{0ki} + \beta^\top x_{ki}) - \sum_l z_{ki} e^{\beta^\top x_{ki}} \Lambda_{0ki} \right] $$ where $k$ is for individual/cluster and $i$ is for a certain line in the data set. The lines are not in the original data set, but rather in a data set that was splitted at the time points of where the frailty changes. Then $z_{ki}$ is the frailty on that line and $\Lambda_{0ki}$ is the cumulative baseline hazard for that line.

The derivatives go as follows: $$ \frac{\partial l}{\partial \beta} = \sum_k \left[\sum_i x_{ki} \delta_{ki} - \sum_i x_{ki} z_{ki} e^{\beta^\top x_{ki}}\Lambda_{0ki} \right] $$ $$ \frac{\partial l}{\partial h_t} = \sum_k \left[\sum_i \frac{\delta_{ki}}{h_{0ki}} 1(R_{ki} = t) - x_{ki} \delta_{ki} - \sum_i z_{ki} e^{\beta^\top x_{ki}} 1(t \in (L_{ki}, R_{ki}]) \right] $$ where $R_{ki}$ is the tstop of the line and $L_{ki}$ is the tstart of the line.

The first matrix that of second derivatives is easy to calculate and it is all written only in R, so the code is there and pretty easy to descipher.

For the second matrix it gets a bit tricky. Take the part within brackets of $\partial l / \partial \beta$ as $B_k$. Then we can write: $$ \mathrm{E} \left[\frac{\partial l}{\partial \beta} \frac{\partial l}{\partial \beta^\top} \right] = \mathrm{E} \left[\sum_k B_k \sum_l B_l \right] = \sum_k \sum_{l \neq k} \mathrm{E} B_k \mathrm{E} B_l^\top + \sum_k \mathrm{E} B_k B_k^\top $$ Now we use the fact that the score functions are 0 at the maximum likelihood, so this can be written as $$ \mathrm{E} \left[\frac{\partial l}{\partial \beta} \frac{\partial l}{\partial \beta^\top} \right] = = \sum_k \left[ \mathrm{E} B_k B_k^\top - \mathrm{E} B_k \mathrm{E}B_k^\top \right] $$ Now further we can split $B_k$ into $B_{k1}$ and $B_{k2}$, with the first part does not depend on the frailty. So all expectations that involve that one are actually the same in both terms from the equation here, so they cancel out. Now about the frailty we know that the estimates are independent if they are from different clusters. Either way, we can write this thing as $$ \sum_k \sum_{i,j} \left( \mathrm{E} [Z_{ki} Z_{kj}] - \mathrm{E} Z_{ki} \mathrm{E} Z_{kj} \right) \,xelpH_{ki} \,\, xelpH_{kj}^\top $$ where $xelpH_{ki} = x_{ki} e^{\beta^\top x_{ki}} \Lambda_{0ki}$.

The rest of the combinations are done in a similar fashion. Here is an outline of how the algorithm actually things about stuff:

define the matrices and vectors corresponding to $\mathrm{E}\left[ \frac{d}{d \gamma} \frac{d}{d \gamma^\top} \right]$
Loop over clusters
Within each cluster, we loop over the different frailty estimates (interval) that exist within each of them ($i,j$)
Within each combination of $i,j$ we loop over the rows from the data that exist in those intervals.
for each combination of rows, we loop over the time points that are contained in those rows
only at this point we add the relevant contributions to the total information matrix