CFsurvival: Estimate counterfactual survival functions

View source: R/fit_survival.R

CFsurvivalR Documentation

Estimate counterfactual survival functions

Description

This function estimates counterfactual survival functions and contrasts from right-censored data subject to potential confounding.

Usage

CFsurvival(
  time,
  event,
  treat,
  confounders,
  fit.times = sort(unique(time[time > 0 & time < max(time[event == 1])])),
  fit.treat = c(0, 1),
  nuisance.options = list(),
  conf.band = TRUE,
  conf.level = 0.95,
  contrasts = c("surv.diff", "surv.ratio"),
  verbose = FALSE
)

Arguments

time

n x 1 numeric vector of observed right-censored follow-up times; i.e. the minimum of the event and censoring times.

event

n x 1 numeric vector of status indicators of whether an event was observed.

treat

n x 1 numeric vector of observed binary treatment/exposure group; either 0 or 1.

confounders

n x p numeric matrix of potential confounders to use when estimating the conditional survival probabilities. Missing values are not allowed

fit.times

k x 1 numeric vector of time grid at which the counterfactual survival function should be computed. Only values > 0 are allowed. Note that even if the survival is only desired at one timepoint (e.g. a final study time), fit.times should be a relatively fine grid between 0 and this final time in order for the computation to work. Defauls to all unique positive values in time.

fit.treat

Optional subset of c(0,1) for which the counterfactual survival curves are desired. For example, if fit.treat=c(0,1) (default behavior) then both curves will be estimated, and if fit.treat=0 then only the curve under non-treatment will be estimated.

nuisance.options

List of options for nuisance parameter specification. See CFsurvival.nuisance.options for details. Defaults to SuperLearners with rich libraries.

conf.band

Logical indicating whether to compute simultaneous confidence bands.

conf.level

Desired coverage of confidence intervals/bands.

contrasts

Character vector indicating which (if any) contrasts of the survival functions are desired. Can include "surv.diff" (difference of survival functions), "surv.ratio" (ratio of survival functions), "risk.ratio" (ratio of the risk functions), or "nnt" (number needed to treat; i.e. recipricol of survival difference). If not NULL, then fit.treat must be c(0,1). Defaults to c("surv.diff", "surv.ratio").

verbose

Logical indicating whether progress should be printed to the command line.

Details

For a binary treatment measured at baseline, the counterfactual survival through time t for treatment level a is defined as the probability of survival up to time t were all units in the population assigned to treatment a. This quantity is estimated using an augmented inverse probability of treatment/censoring weighted (AIPTW) estimator. Our estimator of the counterfactual survival function requires further estimation of three nuisance parameters: the conditional probability of receiving treatment given a set of potential confounders (i.e. the treatment propensity), the conditional survival function of the event given confounders, and the conditional survival function of censoring given confounders. At the moment, we provide two ways of estimating each of these nuisance parameters. The treatment propensity can be estimated either using a logistic regression or SuperLearner (van der Laan and Polley, 2006). The conditional survival functions of event and censoring can be estimated using Cox proportional hazard models or survival random forests (Ishwaran et al.). Additionally, the estimates of any of these parameters can be passed in directly if another method is desired.

CFsurvival by default returns 95% pointwise confidence intervals and 95% uniform confidence bands, both of which are based on the influence function of the estimator. Contrasts of the counterfactual survival under treatment and control can also be obtained, including the survival difference, survival ratio, risk ratio, and number needed to treat. Pointwise confidence intervals and uniform confidence bands for these contrasts will also be computed.

surv.df$ptwise.lower and surv.df$ptwise.upper are lower and upper endpoints of back-transformed pointwise confidence intervals on the logit scale. Pointwise confidence intervals are only provided for values of fit.times that are at least as large as the smallest observed event time in the corresponding treatment cohort. surv.df$unif.ew.lower and surv.df$unif.ew.upper are lower and upper endpoints of an equal-width confidence band. surv.df$unif.logit.lower and surv.df$unif.logit.upper are lower and upper endpoints of an equi-precision confidence band on the logit scale, and trasformed back to the survival scale. This confidence band only covers values of fit.times such that the corresponding estimated survival function is <= .99 and >= .01.

Value

CFsurvival returns a named list with the following elements:

fit.times

The time points at which the counterfactual survival curves (and contrasts) were fit.

fit.treat

The treatment values for which the counterfactual survival curves were fit.

surv.df

A data frame with the estimated counterfactual survival functions, estimated standard errors, confidence interval (ptwise.lower/ptwise.upper is constructed on the survival scale, and ptwise.logit.lower/ptwise.logit.upper is constructed on the logit scale), confidence bands (unif.ew.lower/unif.ew.upper is an equal-width confidence band on the survival scale, and unif.logit.lower/unif.logit.upper is an equi-precision confidence band on the logit scale).

IF.vals.0, IF.vals.1

n x k matrices with the influence values for the counterfactual survival functions. Rows index observations, columns index time points.

surv.diff.df

A data frame with the estimated counterfactual survival difference (treatment survival minus control survival), as well as confidence intervals and tests of the null hypothesis that the difference equals zero. Only returned if surv.diffs=TRUE.

surv.ratio.df

A data frame with the estimated counterfactual survival ratio (treatment survival divided by control survival), as well as confidence intervals and tests of the null hypothesis that the ratio equals one. Note that standard errors and confidence intervals are first computed on the log scale, then exponentiated. Only returned if surv.ratios=TRUE.

risk.ratio.df

A data frame with the estimated counterfactual risk ratio (treatment risk divided by control risk), as well as confidence intervals and tests of the null hypothesis that the ratio equals one. Note that standard errors and confidence intervals are first computed on the log scale, then exponentiated. Only returned if risk.ratios=TRUE.

nnt.df

A data frame with the estimated counterfactual number needed to treat (one divided by the survival difference), as well as confidence intervals and tests of the null hypothesis that the ratio equals one. Note that standard errors and confidence intervals are first computed on the log scale, then exponentiated. Only returned if nnt=TRUE.

surv.0.unif.ew.sim.maxes, surv.0.logit.sim.maxes, surv.1.unif.ew.sim.maxes, surv.1.logit.sim.maxes

Samples from the approximate distribution of the maximum of the limiting Gaussian processes of the counterfactual survival functions.

surv.0.unif.ew.quant, surv.0.logit.quant, surv.1.unif.ew.quant, surv.1.logit.quant, surv.diff.unif.quant, log.risk.ratio.unif.quant, log.surv.ratio.unif.quant, nnt.unif.quant

Estimated quantile of the approximate distribution of the maximum of the limiting Gaussian processes of the counterfactual survival functions or contrast function.

prop.fits

The estimated treatment propensities (if surv.nuis.fits = TRUE).

surv.fits

The estimated conditional survival functions (if surv.nuis.fits = TRUE).

nuisance

Predicted nuisance function values and details of the nuisance function fits.

data

The original time, event, and treatment data supplied to the function.

Examples

# Define parameters
n <- 300
expit <- function(x) 1/(1 + exp(-x))
betaT <- 2; lambdaT <- 20; betaC <- 2; lambdaC <- 15

# Define the true survival functions of the control (S0) and treatment (S1) groups
S0 <- function(t) sapply(t, function(t0) integrate(function(w) .5 * pweibull(t0, shape=betaT, scale=lambdaT * exp(-w-1), lower.tail = FALSE), lower=-1, upper=1)$value)
S1 <- function(t) sapply(t, function(t0) integrate(function(w) .5 * pweibull(t0, shape=betaT, scale=lambdaT * exp(-w), lower.tail = FALSE), lower=-1, upper=1)$value)

# Simulate data
covar <- runif(n, min=-1, max=1)
g0s <- expit(.2 - covar)
rx <- rbinom(n, size=1, prob=g0s)
event.time <- rweibull(n, shape = betaT, scale = lambdaT * exp(-covar - 1 + rx))
cens.time <- rweibull(n, shape = betaC, scale = lambdaC * exp(-covar/5 - rx/5))
cens.time[cens.time > 15] <- 15
obs.time <- pmin(event.time, cens.time)
obs.event <- as.numeric(event.time <= cens.time)

# Estimate the CF survivals
fit <- CFsurvival(time = obs.time, event = obs.event, treat = rx, confounders =  data.frame(covar), contrasts = c("surv.diff","surv.ratio", "risk.ratio","nnt"), verbose = TRUE, fit.times = seq(0, 14.5, by=.1), fit.treat = c(0,1),
nuisance.options = list(prop.SL.library = c("SL.mean", "SL.bayesglm"),
                       event.SL.library = c("survSL.km", "survSL.coxph", "survSL.weibreg", "survSL.expreg"),
                       cens.SL.library = c("survSL.km", "survSL.coxph", "survSL.weibreg", "survSL.expreg"),
                       cross.fit =TRUE, V = 5, save.nuis.fits = TRUE), conf.band = TRUE, conf.level = .95)
# It is a good idea to check the min/max of the propensity estimates and the min of the censoring estimates
# If they are very small, there may be positivity violations, which may result in invalid inference.
min(fit$nuisance$prop.pred)
max(fit$nuisance$prop.pred)
min(fit$nuisance$cens.pred.0)
min(fit$nuisance$cens.pred.1)

# Plot the output
## Not run: 
library(ggplot2)

# First plot the survival curves + conf intervals + conf bands
fit$surv.df$true.surv <- c(S1(c(0, fit$fit.times)), S0(c(0, fit$fit.times)))
ggplot(fit$surv.df) +
    geom_line(aes(time, true.surv, group=trt), color='black') +
    geom_step(aes(time, surv, color=as.factor(trt), group=trt)) +
    geom_step(aes(time, ptwise.lower, color=as.factor(trt), group=trt), linetype=2) +
    geom_step(aes(time, ptwise.upper, color=as.factor(trt), group=trt), linetype=2) +
    geom_step(aes(time, unif.logit.lower, color=as.factor(trt), group=trt), linetype=3) +
    geom_step(aes(time, unif.logit.upper, color=as.factor(trt), group=trt), linetype=3) +
    scale_color_discrete("Treatment") +
    xlab("Time") +
    ylab("Survival") +
    coord_cartesian(xlim=c(0,15), ylim=c(0,1))

# Next plot the survival difference
fit$surv.diff.df$true.surv.diff <- c(S1(fit$surv.diff.df$time) - S0(fit$surv.diff.df$time))
ggplot(fit$surv.diff.df) +
    geom_line(aes(time, true.surv.diff), color='red') +
    geom_line(aes(time, surv.diff)) +
    geom_line(aes(time, ptwise.lower), linetype=2) +
    geom_line(aes(time, ptwise.upper), linetype=2) +
    geom_line(aes(time, unif.lower), linetype=3) +
    geom_line(aes(time, unif.upper), linetype=3) +
    xlab("Time") +
    ylab("Survival difference (treatment - control)") +
    coord_cartesian(xlim=c(0,15))

fit$surv.ratio.df$true.surv.ratio <- c(S1(fit$surv.ratio.df$time) / S0(fit$surv.ratio.df$time))
ggplot(fit$surv.ratio.df) +
    geom_line(aes(time, true.surv.ratio), color='red') +
    geom_line(aes(time, surv.ratio)) +
    geom_line(aes(time, ptwise.lower), linetype=2) +
    geom_line(aes(time, ptwise.upper), linetype=2) +
    geom_line(aes(time, unif.lower), linetype=3) +
    geom_line(aes(time, unif.upper), linetype=3) +
    xlab("Time") +
    ylab("Survival ratio (treatment / control)") +
    geom_hline(yintercept=0, color='blue', linetype=4) +
    coord_cartesian(xlim=c(0,15), ylim=c(1,10))

# Risk ratio
fit$risk.ratio.df$true.risk.ratio <- (1 - S1(fit$risk.ratio.df$time)) / (1 - S0(fit$risk.ratio.df$time))
ggplot(fit$risk.ratio.df) +
    geom_line(aes(time, true.risk.ratio), color='red') +
    geom_line(aes(time, risk.ratio)) +
    geom_line(aes(time, ptwise.lower), linetype=2) +
    geom_line(aes(time, ptwise.upper), linetype=2) +
    geom_line(aes(time, unif.lower), linetype=3) +
    geom_line(aes(time, unif.upper), linetype=3) +
    coord_cartesian(xlim=c(0,15), ylim=c(0,1)) +
    xlab("Time") +
    ylab("Risk ratio (treatment / control)")

# Number needed to treat
fit$nnt.df$true.nnt <- 1/(S1(fit$nnt.df$time) - S0(fit$nnt.df$time))
ggplot(fit$nnt.df) +
    geom_line(aes(time, true.nnt), color='red') +
    geom_line(aes(time, nnt)) +
    geom_line(aes(time, ptwise.lower), linetype=2) +
    geom_line(aes(time, ptwise.upper), linetype=2) +
    geom_line(aes(time, unif.lower), linetype=3) +
    geom_line(aes(time, unif.upper), linetype=3) +
    coord_cartesian(xlim=c(0,15), ylim=c(0, 10)) +
    xlab("Time") +
    ylab("Number needed to treat (NNT)")
## End(Not run)

tedwestling/CFsurvival documentation built on July 27, 2023, 11:35 a.m.