emfrail: Fitting semi-parametric shared frailty models with the EM...
In frailtyEM: Fitting Frailty Models with the EM Algorithm

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Fitting semi-parametric shared frailty models with the EM algorithm

1
2
3

emfrail(formula, data, distribution = emfrail_dist(),
  control = emfrail_control(), model = FALSE, model.matrix = FALSE,
  ...)

`formula`	A formula that contains on the left hand side an object of the type `Surv` and on the right hand side a `+cluster(id)` statement. Two special statments may also be used: `+strata()` for specifying a grouping column that will represent different strata and `+terminal()`
`data`	A `data.frame` in which the formula argument can be evaluated
`distribution`	An object as created by `emfrail_dist`
`control`	An object as created by `emfrail_control`
`model`	Logical. Should the model frame be returned?
`model.matrix`	Logical. Should the model matrix be returned?
`...`	Other arguments, currently used to warn about deprecated argument names

The emfrail function fits shared frailty models for processes which have intensity

λ(t) = z λ_0(t) \exp(β' \mathbf{x})

with a non-parametric (Breslow) baseline intensity λ_0(t). The outcome (left hand side of the formula) must be a Surv object.

If the object is Surv(tstop, status) then the usual failure time data is represented. Gap-times between recurrent events are represented in the same way. If the left hand side of the formula is created as Surv(tstart, tstop, status), this may represent a number of things: (a) recurrent events episodes in calendar time where a recurrent event episode starts at tstart and ends at tstop (b) failure time data with time-dependent covariates where tstop is the time of a change in covariates or censoring (status = 0) or an event time (status = 1) or (c) clustered failure time with left truncation, where tstart is the individual's left truncation time. Unlike regular Cox models, a major distinction is that in case (c) the distribution of the frailty must be considered conditional on survival up to the left truncation time.

The +cluster() statement specified the column that determines the grouping (the observations that share the same frailty). The +strata() statement specifies a column that determines different strata, for which different baseline hazards are calculated. The +terminal specifies a column that contains an indicator for dependent censoring, and then performs a score test

The distribution argument must be generated by a call to emfrail_dist. This determines the frailty distribution, which may be one of gamma, positive stable or PVF (power-variance-function), and the starting value for the maximum likelihood estimation. The PVF family also includes a tuning parameter that differentiates between inverse Gaussian and compound Poisson distributions. Note that, with univariate data (at most one event per individual, no clusters), only distributions with finite expectation are identifiable. This means that the positive stable distribution should have a maximum likelihood on the edge of the parameter space (theta = +\inf, corresponding to a Cox model for independent observations).

The control argument must be generated by a call to emfrail_control. Several parameters may be adjusted that control the precision of the convergenge criteria or supress the calculation of different quantities.

An object of class emfrail that contains the following fields:

`coefficients`	A named vector of the estimated regression coefficients
`hazard`	The breslow estimate of the baseline hazard at each event time point, in chronological order
`var`	The variance-covariance matrix corresponding to the coefficients and hazard, assuming θ constant
`var_adj`	The variance-covariance matrx corresponding to the coefficients and hazard, adjusted for the estimation of theta
`logtheta`	The logarithm of the point estimate of θ. For the gamma and PVF family of distributions, this is the inverse of the estimated frailty variance.
`var_logtheta`	The variance of the estimated logarithm of θ
`ci_logtheta`	The likelihood-based 95% confidence interval for the logarithm of θ
`frail`	The posterior (empirical Bayes) estimates of the frailty for each cluster
`residuals`	A list with two elements, cluster which is a vector that the sum of the cumulative hazards from each cluster for a frailty value of 1, and individual, which is a vector that contains the cumulative hazard corresponding to each row of the data, multiplied by the corresponding frailty estimate
`tev`	The time points of the events in the data set, this is the same length as hazard
`nevents_id`	The number of events for each cluster
`loglik`	A vector of length two with the log-likelihood of the starting Cox model and the maximized log-likelihood
`ca_test`	The results of the Commenges-Andersen test for heterogeneity
`cens_test`	The results of the test for dependence between a recurrent event and a terminal event, if the `+terminal()` statement is specified and the frailty distribution is gamma
`zph`	The result of `cox.zph` called on a model with the estimated log-frailties as offset
`formula, distribution, control`	The original arguments
`nobs, fitted`	Number of observations and fitted values (i.e. z \exp(β^T x))
`mf`	The `model.frame`, if `model = TRUE`
`mm`	The `model.matrix`, if `model.matrix = TRUE`

Several options in the control arguemnt shorten the running time for emfrail significantly. These are disabling the adjustemnt of the standard errors (se_adj = FALSE), disabling the likelihood-based confidence intervals (lik_ci = FALSE) or disabling the score test for heterogeneity (ca_test = FALSE).

The algorithm is detailed in the package vignette. For the gamma frailty, the results should be identical with those from coxph with ties = "breslow".

Theodor Balan hello@tbalan.com

Balan TA, Putter H (2019) "frailtyEM: An R Package for Estimating Semiparametric Shared Frailty Models", Journal of Statistical Software 90(7) 1-29. doi:10.18637/jss.v090.i07

plot.emfrail and autoplot.emfrail for plot functions directly available, emfrail_pll for calculating \widehat{L}(θ) at specific values of θ, summary.emfrail for transforming the emfrail object into a more human-readable format and for visualizing the frailty (empirical Bayes) estimates, predict.emfrail for calculating and visalizing conditional and marginal survival and cumulative hazard curves. residuals.emfrail for extracting martingale residuals and logLik.emfrail for extracting the log-likelihood of the fitted model.

m_gamma <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                   data =  rats)

# Inverse Gaussian distribution
m_ig <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                data =  rats,
                distribution = emfrail_dist(dist = "pvf"))

# for the PVF distribution with m = 0.75
m_pvf <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                 data =  rats,
                 distribution = emfrail_dist(dist = "pvf", pvfm = 0.75))

# for the positive stable distribution
m_ps <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                data =  rats,
                distribution = emfrail_dist(dist = "stable"))
## Not run: 
# Compare marginal log-likelihoods
models <- list(m_gamma, m_ig, m_pvf, m_ps)

models
logliks <- lapply(models, logLik)

names(logliks) <- lapply(models,
                         function(x) with(x$distribution,
                                          ifelse(dist == "pvf",
                                                 paste(dist, "/", pvfm),
                                                 dist))
)

logliks

## End(Not run)

# Stratified analysis
## Not run: 
  m_strat <- emfrail(formula = Surv(time, status) ~  rx + strata(sex) + cluster(litter),
                     data =  rats)

## End(Not run)


# Test for conditional proportional hazards (log-frailty as offset)
## Not run: 
m_gamma <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
  data =  rats, control = emfrail_control(zph = TRUE))
par(mfrow = c(1,2))
plot(m_gamma$zph)

## End(Not run)

# Draw the profile log-likelihood
## Not run: 
  fr_var <- seq(from = 0.01, to = 1.4, length.out = 20)

  # For gamma the variance is 1/theta (see parametrizations)
  pll_gamma <- emfrail_pll(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                           data =  rats,
                           values = 1/fr_var )
  plot(fr_var, pll_gamma,
       type = "l",
       xlab = "Frailty variance",
       ylab = "Profile log-likelihood")


  # Recurrent events
  mod_rec <- emfrail(Surv(start, stop, status) ~ treatment + cluster(id), bladder1)
  # The warnings appear from the Surv object, they also appear in coxph.

  plot(mod_rec, type = "hist")

## End(Not run)

# Left truncation
## Not run: 
  # We simulate some data with truncation times
  set.seed(2018)
  nclus <- 300
  nind <- 5
  x <- sample(c(0,1), nind * nclus, TRUE)
  u <- rep(rgamma(nclus,1,1), each = 3)

  stime <- rexp(nind * nclus, rate = u * exp(0.5 * x))

  status <- ifelse(stime > 5, 0, 1)
  stime[status == 0] <- 5

  # truncate uniform between 0 and 2
  ltime <- runif(nind * nclus, min = 0, max = 2)

  d <- data.frame(id = rep(1:nclus, each = nind),
                  x = x,
                  stime = stime,
                  u = u,
                  ltime = ltime,
                  status = status)
  d_left <- d[d$stime > d$ltime,]

  mod <- emfrail(Surv(stime, status)~ x + cluster(id), d)
  # This model ignores the left truncation, 0.378 frailty variance:
  mod_1 <- emfrail(Surv(stime, status)~ x + cluster(id), d_left)

  # This model takes left truncation into account,
 # but it considers the distribution of the frailty unconditional on the truncation
 mod_2 <- emfrail(Surv(ltime, stime, status)~ x + cluster(id), d_left)

  # This is identical with:
  mod_cox <- coxph(Surv(ltime, stime, status)~ x + frailty(id), data = d_left)


  # The correct thing is to consider the distribution of the frailty given the truncation
  mod_3 <- emfrail(Surv(ltime, stime, status)~ x + cluster(id), d_left,
                   distribution = emfrail_dist(left_truncation = TRUE))

  summary(mod_1)
  summary(mod_2)
  summary(mod_3)

## End(Not run)