emfrail: Fitting semi-parametric shared frailty models with the EM...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/emfrail.R

Description

Fitting semi-parametric shared frailty models with the EM algorithm

Usage

1
2
3

Arguments

formula

A formula that contains on the left hand side an object of the type Surv and on the right hand side a +cluster(id) statement. Two special statments may also be used: +strata() for specifying a grouping column that will represent different strata and +terminal()

data

A data.frame in which the formula argument can be evaluated

distribution

An object as created by emfrail_dist

control

An object as created by emfrail_control

model

Logical. Should the model frame be returned?

model.matrix

Logical. Should the model matrix be returned?

...

Other arguments, currently used to warn about deprecated argument names

Details

The emfrail function fits shared frailty models for processes which have intensity

λ(t) = z λ_0(t) \exp(β' \mathbf{x})

with a non-parametric (Breslow) baseline intensity λ_0(t). The outcome (left hand side of the formula) must be a Surv object.

If the object is Surv(tstop, status) then the usual failure time data is represented. Gap-times between recurrent events are represented in the same way. If the left hand side of the formula is created as Surv(tstart, tstop, status), this may represent a number of things: (a) recurrent events episodes in calendar time where a recurrent event episode starts at tstart and ends at tstop (b) failure time data with time-dependent covariates where tstop is the time of a change in covariates or censoring (status = 0) or an event time (status = 1) or (c) clustered failure time with left truncation, where tstart is the individual's left truncation time. Unlike regular Cox models, a major distinction is that in case (c) the distribution of the frailty must be considered conditional on survival up to the left truncation time.

The +cluster() statement specified the column that determines the grouping (the observations that share the same frailty). The +strata() statement specifies a column that determines different strata, for which different baseline hazards are calculated. The +terminal specifies a column that contains an indicator for dependent censoring, and then performs a score test

The distribution argument must be generated by a call to emfrail_dist. This determines the frailty distribution, which may be one of gamma, positive stable or PVF (power-variance-function), and the starting value for the maximum likelihood estimation. The PVF family also includes a tuning parameter that differentiates between inverse Gaussian and compound Poisson distributions. Note that, with univariate data (at most one event per individual, no clusters), only distributions with finite expectation are identifiable. This means that the positive stable distribution should have a maximum likelihood on the edge of the parameter space (theta = +\inf, corresponding to a Cox model for independent observations).

The control argument must be generated by a call to emfrail_control. Several parameters may be adjusted that control the precision of the convergenge criteria or supress the calculation of different quantities.

Value

An object of class emfrail that contains the following fields:

coefficients

A named vector of the estimated regression coefficients

hazard

The breslow estimate of the baseline hazard at each event time point, in chronological order

var

The variance-covariance matrix corresponding to the coefficients and hazard, assuming θ constant

var_adj

The variance-covariance matrx corresponding to the coefficients and hazard, adjusted for the estimation of theta

logtheta

The logarithm of the point estimate of θ. For the gamma and PVF family of distributions, this is the inverse of the estimated frailty variance.

var_logtheta

The variance of the estimated logarithm of θ

ci_logtheta

The likelihood-based 95% confidence interval for the logarithm of θ

frail

The posterior (empirical Bayes) estimates of the frailty for each cluster

residuals

A list with two elements, cluster which is a vector that the sum of the cumulative hazards from each cluster for a frailty value of 1, and individual, which is a vector that contains the cumulative hazard corresponding to each row of the data, multiplied by the corresponding frailty estimate

tev

The time points of the events in the data set, this is the same length as hazard

nevents_id

The number of events for each cluster

loglik

A vector of length two with the log-likelihood of the starting Cox model and the maximized log-likelihood

ca_test

The results of the Commenges-Andersen test for heterogeneity

cens_test

The results of the test for dependence between a recurrent event and a terminal event, if the +terminal() statement is specified and the frailty distribution is gamma

zph

The result of cox.zph called on a model with the estimated log-frailties as offset

formula, distribution, control

The original arguments

nobs, fitted

Number of observations and fitted values (i.e. z \exp(β^T x))

mf

The model.frame, if model = TRUE

mm

The model.matrix, if model.matrix = TRUE

Note

Several options in the control arguemnt shorten the running time for emfrail significantly. These are disabling the adjustemnt of the standard errors (se_adj = FALSE), disabling the likelihood-based confidence intervals (lik_ci = FALSE) or disabling the score test for heterogeneity (ca_test = FALSE).

The algorithm is detailed in the package vignette. For the gamma frailty, the results should be identical with those from coxph with ties = "breslow".

Author(s)

Theodor Balan hello@tbalan.com

References

Balan TA, Putter H (2019) "frailtyEM: An R Package for Estimating Semiparametric Shared Frailty Models", Journal of Statistical Software 90(7) 1-29. doi:10.18637/jss.v090.i07

See Also

plot.emfrail and autoplot.emfrail for plot functions directly available, emfrail_pll for calculating \widehat{L}(θ) at specific values of θ, summary.emfrail for transforming the emfrail object into a more human-readable format and for visualizing the frailty (empirical Bayes) estimates, predict.emfrail for calculating and visalizing conditional and marginal survival and cumulative hazard curves. residuals.emfrail for extracting martingale residuals and logLik.emfrail for extracting the log-likelihood of the fitted model.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
m_gamma <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                   data =  rats)

# Inverse Gaussian distribution
m_ig <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                data =  rats,
                distribution = emfrail_dist(dist = "pvf"))

# for the PVF distribution with m = 0.75
m_pvf <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                 data =  rats,
                 distribution = emfrail_dist(dist = "pvf", pvfm = 0.75))

# for the positive stable distribution
m_ps <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                data =  rats,
                distribution = emfrail_dist(dist = "stable"))
## Not run: 
# Compare marginal log-likelihoods
models <- list(m_gamma, m_ig, m_pvf, m_ps)

models
logliks <- lapply(models, logLik)

names(logliks) <- lapply(models,
                         function(x) with(x$distribution,
                                          ifelse(dist == "pvf",
                                                 paste(dist, "/", pvfm),
                                                 dist))
)

logliks

## End(Not run)

# Stratified analysis
## Not run: 
  m_strat <- emfrail(formula = Surv(time, status) ~  rx + strata(sex) + cluster(litter),
                     data =  rats)

## End(Not run)


# Test for conditional proportional hazards (log-frailty as offset)
## Not run: 
m_gamma <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
  data =  rats, control = emfrail_control(zph = TRUE))
par(mfrow = c(1,2))
plot(m_gamma$zph)

## End(Not run)

# Draw the profile log-likelihood
## Not run: 
  fr_var <- seq(from = 0.01, to = 1.4, length.out = 20)

  # For gamma the variance is 1/theta (see parametrizations)
  pll_gamma <- emfrail_pll(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                           data =  rats,
                           values = 1/fr_var )
  plot(fr_var, pll_gamma,
       type = "l",
       xlab = "Frailty variance",
       ylab = "Profile log-likelihood")


  # Recurrent events
  mod_rec <- emfrail(Surv(start, stop, status) ~ treatment + cluster(id), bladder1)
  # The warnings appear from the Surv object, they also appear in coxph.

  plot(mod_rec, type = "hist")

## End(Not run)

# Left truncation
## Not run: 
  # We simulate some data with truncation times
  set.seed(2018)
  nclus <- 300
  nind <- 5
  x <- sample(c(0,1), nind * nclus, TRUE)
  u <- rep(rgamma(nclus,1,1), each = 3)

  stime <- rexp(nind * nclus, rate = u * exp(0.5 * x))

  status <- ifelse(stime > 5, 0, 1)
  stime[status == 0] <- 5

  # truncate uniform between 0 and 2
  ltime <- runif(nind * nclus, min = 0, max = 2)

  d <- data.frame(id = rep(1:nclus, each = nind),
                  x = x,
                  stime = stime,
                  u = u,
                  ltime = ltime,
                  status = status)
  d_left <- d[d$stime > d$ltime,]

  mod <- emfrail(Surv(stime, status)~ x + cluster(id), d)
  # This model ignores the left truncation, 0.378 frailty variance:
  mod_1 <- emfrail(Surv(stime, status)~ x + cluster(id), d_left)

  # This model takes left truncation into account,
 # but it considers the distribution of the frailty unconditional on the truncation
 mod_2 <- emfrail(Surv(ltime, stime, status)~ x + cluster(id), d_left)

  # This is identical with:
  mod_cox <- coxph(Surv(ltime, stime, status)~ x + frailty(id), data = d_left)


  # The correct thing is to consider the distribution of the frailty given the truncation
  mod_3 <- emfrail(Surv(ltime, stime, status)~ x + cluster(id), d_left,
                   distribution = emfrail_dist(left_truncation = TRUE))

  summary(mod_1)
  summary(mod_2)
  summary(mod_3)

## End(Not run)

teddybalan/frailtoys documentation built on Sept. 21, 2019, 6:22 p.m.