predictCox: Survival probabilities, hazards and cumulative hazards from...

View source: R/predictCox.R

predictCoxR Documentation

Survival probabilities, hazards and cumulative hazards from Cox regression models

Description

Routine to get baseline hazards and subject specific hazards as well as survival probabilities from a survival::coxph or rms::cph object.

Usage

predictCox(
  object,
  times,
  newdata = NULL,
  centered = TRUE,
  type = c("cumhazard", "survival"),
  keep.strata = TRUE,
  keep.times = TRUE,
  keep.newdata = FALSE,
  keep.infoVar = FALSE,
  se = FALSE,
  band = FALSE,
  iid = FALSE,
  confint = (se + band) > 0,
  diag = FALSE,
  average.iid = FALSE,
  product.limit = FALSE,
  store = NULL
)

Arguments

object

The fitted Cox regression model object either obtained with coxph (survival package) or cph (rms package).

times

[numeric vector] Time points at which to return the estimated hazard/cumulative hazard/survival.

newdata

[data.frame or data.table] Contain the values of the predictor variables defining subject specific predictions. Should have the same structure as the data set used to fit the object.

centered

[logical] If TRUE return prediction at the mean values of the covariates fit$mean, if FALSE return a prediction for all covariates equal to zero. in the linear predictor. Will be ignored if argument newdata is used. For internal use.

type

[character vector] the type of predicted value. Choices are

  • "hazard" the baseline hazard function when argument newdata is not used and the hazard function when argument newdata is used.

  • "cumhazard" the cumulative baseline hazard function when argument newdata is not used and the cumulative hazard function when argument newdata is used.

  • "survival" the survival baseline hazard function when argument newdata is not used and the cumulative hazard function when argument newdata is used.

Several choices can be combined in a vector of strings that match (no matter the case) strings "hazard","cumhazard", "survival".

keep.strata

[logical] If TRUE add the (newdata) strata to the output. Only if there any.

keep.times

[logical] If TRUE add the evaluation times to the output.

keep.newdata

[logical] If TRUE add the value of the covariates used to make the prediction in the output list.

keep.infoVar

[logical] For internal use.

se

[logical] If TRUE compute and add the standard errors to the output.

band

[logical] If TRUE compute and add the quantiles for the confidence bands to the output.

iid

[logical] If TRUE compute and add the influence function to the output.

confint

[logical] If TRUE compute and add the confidence intervals/bands to the output. They are computed applying the confint function to the output.

diag

[logical] when FALSE the hazard/cumlative hazard/survival for all observations at all times is computed, otherwise it is only computed for the i-th observation at the i-th time.

average.iid

[logical] If TRUE add the average of the influence function over newdata to the output.

product.limit

[logical]. If TRUE the survival is computed using the product limit estimator. Otherwise the exponential approximation is used (i.e. exp(-cumulative hazard)).

store

[vector of length 2] Whether prediction should only be computed for unique covariate sets and mapped back to the original dataset (data="minimal") and whether the influence function should be stored in a memory efficient way (iid="minimal"). Otherwise use data="full" and/or iid="full".

Details

When the argument newdata is not specified, the function computes the baseline hazard estimate. See (Ozenne et al., 2017) section "Handling of tied event times".

Otherwise the function computes survival probabilities with confidence intervals/bands. See (Ozenne et al., 2017) section "Confidence intervals and confidence bands for survival probabilities". The survival is computed using the exponential approximation (equation 3).

A detailed explanation about the meaning of the argument store = c(iid="full") can be found in (Ozenne et al., 2017) Appendix B "Saving the influence functions".

The function is not compatible with time varying predictor variables.

The centered argument enables us to reproduce the results obtained with the basehaz function from the survival package but should not be modified by the user.

Value

The iid decomposition is output in an attribute called "iid", using an array containing the value of the influence of each subject used to fit the object (dim 1), for each subject in newdata (dim 3), and each time (dim 2).

Author(s)

Brice Ozenne broz@sund.ku.dk, Thomas A. Gerds tag@biostat.ku.dk

References

Brice Ozenne, Anne Lyngholm Sorensen, Thomas Scheike, Christian Torp-Pedersen and Thomas Alexander Gerds. riskRegression: Predicting the Risk of an Event using Cox Regression Models. The R Journal (2017) 9:2, pages 440-460.

See Also

confint.predictCox to compute confidence intervals/bands. autoplot.predictCox to display the predictions.

Examples


library(survival)
library(data.table)

#######################
#### generate data ####
#######################

set.seed(10)
d <- sampleData(50,outcome="survival") ## training dataset
nd <- sampleData(5,outcome="survival") ## validation dataset

## add ties
d$time.round <- round(d$time,1)
any(duplicated(d$time.round))

## add categorical variables
d$U <- sample(letters[1:3],replace=TRUE,size=NROW(d))
d$V <- sample(letters[5:8],replace=TRUE,size=NROW(d))
nd <- cbind(nd,d[sample(NROW(d),replace=TRUE,size=NROW(nd)),c("U","V")])

######################
#### Kaplan Meier ####
######################

#### no ties
fit.KM <- coxph(Surv(time,event)~ 1, data=d, x = TRUE, y = TRUE)
predictCox(fit.KM) ## exponential approximation
ePL.KM <- predictCox(fit.KM, product.limit = TRUE) ## product-limit
as.data.table(ePL.KM)

range(survfit(Surv(time,event)~1, data = d)$surv - ePL.KM$survival) ## same

#### with ties (exponential approximation, Efron estimator for the baseline hazard)
fit.KM.ties <- coxph(Surv(time.round,event)~ 1, data=d, x = TRUE, y = TRUE)
predictCox(fit.KM)

## retrieving survfit results with ties
fit.KM.ties <- coxph(Surv(time.round,event)~ 1, data=d,
                     x = TRUE, y = TRUE, ties = "breslow")
ePL.KM.ties <- predictCox(fit.KM, product.limit = TRUE)
range(survfit(Surv(time,event)~1, data = d)$surv - ePL.KM.ties$survival) ## same

#################################
#### Stratified Kaplan Meier ####
#################################

fit.SKM <- coxph(Surv(time,event)~strata(X2), data=d, x = TRUE, y = TRUE)
ePL.SKM <- predictCox(fit.SKM, product.limit = TRUE)
ePL.SKM

range(survfit(Surv(time,event)~X2, data = d)$surv - ePL.SKM$survival) ## same

###################
#### Cox model ####
###################

fit.Cox <- coxph(Surv(time,event)~X1 + X2 + X6, data=d, x = TRUE, y = TRUE)

#### compute the baseline cumulative hazard
Cox.haz0 <- predictCox(fit.Cox)
Cox.haz0

range(survival::basehaz(fit.Cox)$hazard-Cox.haz0$cumhazard) ## same

#### compute individual specific cumulative hazard and survival probabilities
## exponential approximation
fit.predCox <- predictCox(fit.Cox, newdata=nd, times=c(3,8),
                          se = TRUE, band = TRUE)
fit.predCox

## product-limit
fitPL.predCox <- predictCox(fit.Cox, newdata=nd, times=c(3,8),
                            product.limit = TRUE, se = TRUE)
fitPL.predCox

## Note: product limit vs. exponential does not affect uncertainty quantification
range(fitPL.predCox$cumhazard.se - fit.predCox$cumhazard.se) ## same
range(fitPL.predCox$survival.se - fit.predCox$survival.se) ## different
## except through a multiplicative factor (multiply by survival in the delta method)
fitPL.predCox$survival.se - fitPL.predCox$cumhazard.se * fitPL.predCox$survival


#### left truncation
test2 <- list(start=c(1,2,5,2,1,7,3,4,8,8), 
              stop=c(2,3,6,7,8,9,9,9,14,17), 
              event=c(1,1,1,1,1,1,1,0,0,0), 
              x=c(1,0,0,1,0,1,1,1,0,0)) 
m.cph <- coxph(Surv(start, stop, event) ~ 1, test2, x = TRUE)
as.data.table(predictCox(m.cph))

basehaz(m.cph)

##############################
#### Stratified Cox model ####
##############################

#### one strata variable
fit.SCox <- coxph(Surv(time,event)~X1+strata(X2)+X6, data=d, x = TRUE, y = TRUE)
predictCox(fit.SCox, newdata=nd, times = c(3,12))
predictCox(fit.SCox, newdata=nd, times = c(3,12), product.limit = TRUE)
## NA: timepoint is beyond the last observation in the strata and censoring
tapply(d$time,d$X2,max)

#### two strata variables
fit.S2Cox <- coxph(Surv(time,event)~X1+strata(U)+strata(V)+X2,
                   data=d, x = TRUE, y = TRUE)

base.S2Cox <- predictCox(fit.S2Cox)
range(survival::basehaz(fit.S2Cox)$hazard - base.S2Cox$cumhazard) ## same
predictCox(fit.S2Cox, newdata=nd, times = 3)

tagteam/riskRegression documentation built on Oct. 19, 2024, 7:43 p.m.