ate: Average Treatment Effects Computation

View source: R/ate.R

ateR Documentation

Average Treatment Effects Computation

Description

Use the g-formula or the IPW or the double robust estimator to estimate the average treatment effect (absolute risk difference or ratio) based on Cox regression with or without competing risks.

Usage

ate(
  event,
  treatment,
  censor = NULL,
  data,
  data.index = NULL,
  formula = NULL,
  estimator = NULL,
  strata = NULL,
  contrasts = NULL,
  allContrasts = NULL,
  times,
  cause = NA,
  landmark = NULL,
  se = TRUE,
  iid = (B == 0) && (se || band),
  known.nuisance = FALSE,
  band = FALSE,
  B = 0,
  seed,
  handler = "foreach",
  mc.cores = 1,
  cl = NULL,
  verbose = TRUE,
  ...
)

Arguments

event

Outcome model which describes how the probability of experiencing a terminal event depends on treatment and covariates. The object carry its own call and have a predictRisk method. See examples.

treatment

Treatment model which describes how the probability of being allocated to a treatment group depends on covariates. The object must be a glm object (logistic regression) or the name of the treatment variable. See examples.

censor

Censoring model which describes how the probability of being censored depends on treatment and covariates. The object must be a coxph or cph object. See examples.

data

[data.frame or data.table] Data set in which to evaluate risk predictions based on the outcome model

data.index

[numeric vector] Position of the observation in argument data relative to the dataset used to obtain the argument event, treatment, censor. Only necessary for the standard errors when computing the Average Treatment Effects on a subset of the data set.

formula

For analyses with time-dependent covariates, the response formula. See examples.

estimator

[character] The type of estimator used to compute the average treatment effect. Can be "G-formula", "IPTW", or "AIPTW". When using estimator="G-formula", a model for the outcome should be provided (argument event). When using estimator="IPTW", a model for the treatment should be provided (argument treatment), as well as for the censoring (if any, argument censor). When using estimator="AIPTW" (double robust estimator), a model for the outcome and the treatment should be provided (argument event and treatment), as well as for the censoring (if any, argument censor).

strata

[character] Strata variable on which to compute the average risk. Incompatible with treatment. Experimental.

contrasts

[character vector] levels of the treatment variable for which the risks should be assessed and compared. Default is to consider all levels.

allContrasts

[2-row character matrix] levels of the treatment variable to be compared. Default is to consider all pairwise comparisons.

times

[numeric vector] Time points at which to evaluate average treatment effects.

cause

[integer/character] the cause of interest.

landmark

for models with time-dependent covariates the landmark time(s) of evaluation. In this case, argument time may only be one value and for the prediction of risks it is assumed that that the covariates do not change between landmark and landmark+time.

se

[logical] If TRUE compute and add the standard errors to the output.

iid

[logical] If TRUE compute and add the influence function to the output.

known.nuisance

[logical] If FALSE the uncertainty related to the estimation of the nuisance parameters is ignored. This greatly simplifies computations but requires to use a double robust estimator. The resulting standard error is known to be consistent when all event, treatment, and censoring models are valid.

band

[logical] If TRUE compute and add the quantiles for the confidence bands to the output.

B

[integer, >0] the number of bootstrap replications used to compute the confidence intervals. If it equals 0, then the influence function is used to compute Wald-type confidence intervals/bands.

seed

[integer, >0] sed number used to generate seeds for bootstrap and to achieve reproducible results.

handler

[character] Parallel handler for bootstrap. "foreach" is the default and the only option on Windows. It uses parallel to create a cluster. Other operating systems can use "mclapply". This argument is ignored when mc.cores=1 and cl=NULL.

mc.cores

[integer, >0] The number of cores to use, i.e., the upper limit for the number of child processes that run simultaneously. Passed to parallel::mclapply or parallel::makeCluster. The option is initialized from environment variable mc_cores if set.

cl

A parallel socket cluster used to perform cluster calculation in parallel (output by parallel::makeCluster). The packages necessary to run the computations (e.g. riskRegression) must already be loaded on each worker. Only used when handler="foreach".

verbose

[logical] If TRUE inform about estimated run time.

...

passed to predictRisk

Author(s)

Brice Ozenne broz@sund.ku.dk and Thomas Alexander Gerds tag@biostat.ku.dk

References

Brice Maxime Hugues Ozenne, Thomas Harder Scheike, Laila Staerk, and Thomas Alexander Gerds. On the estimation of average treatment effects with right- censored time to event outcome and competing risks. Biometrical Journal, 62 (3):751–763, 2020.

See Also

autoplot.ate for a graphical representation the standardized risks.
coef.ate to output estimates for the the average risk, or difference in average risks, or ratio between average risks.
confint.ate to output a list containing all estimates (average & difference & ratio) with their confidence intervals and p-values.
model.tables.ate to output a data.frame containing one type of estimates (average or difference or ratio) with its confidence intervals and p-values.
summary.ate for displaying in the console a summary of the results.

Examples

library(survival)
library(rms)
library(prodlim)
library(data.table)

############################
#### Survival settings  ####
#### ATE with Cox model ####
############################

#### generate data ####
n <- 100
set.seed(10)
dtS <- sampleData(n, outcome="survival")
dtS$time <- round(dtS$time,1)
dtS$X1 <- factor(rbinom(n, prob = c(0.3,0.4) , size = 2), labels = paste0("T",0:2))

##### estimate the Cox model ####
fit <- cph(formula = Surv(time,event)~ X1+X2,data=dtS,y=TRUE,x=TRUE)

##### compute the ATE ####
## at times 5, 6, 7, and 8 using X1 as the treatment variable
## standard error computed using the influence function
## confidence intervals / p-values based on asymptotic results
ateFit1a <- ate(fit, treatment = "X1", times = 5:8, data = dtS)
summary(ateFit1a)
summary(ateFit1a, short = TRUE, type = "meanRisk")
summary(ateFit1a, short = TRUE, type = "diffRisk")
summary(ateFit1a, short = TRUE, type = "ratioRisk")

## Not run: 
#### ATE with confidence bands ####
## same as before with in addition the confidence bands / adjusted p-values
## (argument band = TRUE)
ateFit1b <- ate(fit, treatment = "X1", times = 5:8, data = dtS, band = TRUE)
summary(ateFit1b)

## by default bands/adjuste p-values computed separately for each treatment modality
summary(ateFit1b, band = 1,
        se = FALSE, type = "diffRisk", short = TRUE, quantile = TRUE)
## adjustment over treatment and time using the band argument of confint
summary(ateFit1b, band = 2,
       se = FALSE, type = "diffRisk", short = TRUE, quantile = TRUE)

## End(Not run)

## Not run: 
#### ATE with non-parametric bootstrap ####
## confidence intervals / p-values computed using 1000 bootstrap samples
## (argument se = TRUE and B = 1000) 
ateFit1c <- ate(fit, treatment = "X1", times = 5:8, data = dtS,
                se = TRUE, B = 50, handler = "mclapply")
## NOTE: for real applications 50 bootstrap samples is not enough 

## same but using 2 cpus for generating and analyzing the bootstrap samples
## (parallel computation, argument mc.cores = 2) 
ateFit1d <- ate(fit, treatment = "X1", times = 5:8, data = dtS, 
                se = TRUE, B = 50, mc.cores = 2)

## manually defining the cluster to be used
## useful when specific packages need to be loaded in each cluster
fit <- cph(formula = Surv(time,event)~ X1+X2+rcs(X6),data=dtS,y=TRUE,x=TRUE)

cl <- parallel::makeCluster(2)
parallel::clusterEvalQ(cl, library(rms))

ateFit1e <- ate(fit, treatment = "X1", times = 5:8, data = dtS, 
                se = TRUE, B = 50, handler = "foreach", cl = cl)
parallel::stopCluster(cl)

## End(Not run)


################################################
#### Competing risks settings               ####
#### ATE with cause specific Cox regression ####
################################################

#### generate data ####
n <- 500
set.seed(10)
dt <- sampleData(n, outcome="competing.risks")
dt$X1 <- factor(rbinom(n, prob = c(0.2,0.3) , size = 2), labels = paste0("T",0:2))

#### estimate cause specific Cox model ####
fitCR <-  CSC(Hist(time,event)~ X1+X8,data=dt,cause=1)

#### compute the ATE ####
## at times 1, 5, 10
## using X1 as the treatment variable
ateFit2a <- ate(fitCR, treatment = "X1", times = c(1,5,10), data = dt, 
                cause = 1, se = TRUE, band = TRUE)
summary(ateFit2a)
as.data.table(ateFit2a)

#############################################
#### Survival settings without censoring ####
#### ATE with glm                        ####
#############################################

#### generate data ####
n <- 100
dtB <- sampleData(n, outcome="binary")
dtB[, X2 := as.numeric(X2)]

##### estimate a logistic regression model ####
fit <- glm(formula = Y ~ X1+X2, data=dtB, family = "binomial")

#### compute the ATE ####
## using X1 as the treatment variable
## only point estimate (argument se = FALSE)
ateFit1a <- ate(fit, treatment = "X1", data = dtB, se = FALSE)
ateFit1a

## Not run: 
## with confidence intervals
ateFit1b <- ate(fit, data = dtB, treatment = "X1",
               times = 5) ## just for having a nice output not used in computations
summary(ateFit1b, short = TRUE)

## using the lava package
library(lava)
ateLava <- estimate(fit, function(p, data){
a <- p["(Intercept)"] ; b <- p["X11"] ; c <- p["X2"] ;
R.X11 <- expit(a + b + c * data[["X2"]])
R.X10 <- expit(a + c * data[["X2"]])
list(risk0=R.X10,risk1=R.X11,riskdiff=R.X11-R.X10)},
average=TRUE)
ateLava

## End(Not run)

############################
#### Survival settings  ####
#### ATE with glm       ####
############################

## see wglm for handling right-censoring with glm

#################################
#### Double robust estimator ####
#################################

## Not run: 
## generate data
n <- 500
set.seed(10)
dt <- sampleData(n, outcome="competing.risks")
dt$X1 <- factor(rbinom(n, prob = c(0.4) , size = 1), labels = paste0("T",0:1))

## working models
m.event <-  CSC(Hist(time,event)~ X1+X2+X3+X5+X8,data=dt)
m.censor <-  coxph(Surv(time,event==0)~ X1+X2+X3+X5+X8,data=dt, x = TRUE, y = TRUE)
m.treatment <-  glm(X1~X2+X3+X5+X8,data=dt,family=binomial(link="logit"))

## prediction + average
ateRobust <- ate(event = m.event,
                 treatment = m.treatment,
                 censor = m.censor,
                 data = dt, times = 5:10, 
                 cause = 1, band = TRUE)
summary(ateRobust)

## compare various estimators
system.time( ## about 1.5s
ateRobust2 <- ate(event = m.event,
                 treatment = m.treatment,
                 censor = m.censor,
                 estimator = c("GFORMULA","IPTW","AIPTW"),
                 data = dt, times = c(5:10), 
                 cause = 1, se = TRUE)
)
as.data.table(ateRobust2, type = "meanRisk")
as.data.table(ateRobust2, type = "diffRisk")

## approximation to speed up calculations
dt$time.round <- round(dt$time/0.5)*0.5 ## round to the nearest half
dt$time.round[dt$time.round==0] <- 0.1 ## ensure strictly positive event times
mRound.event <-  CSC(Hist(time.round,event)~ X1+X2+X3+X5+X8,data=dt)
mRound.censor <-  coxph(Surv(time.round,event==0)~ X1+X2+X3+X5+X8,data=dt, x = TRUE, y = TRUE)
system.time( ## about 0.6s
ateRobust3 <- ate(event = mRound.event,
                 treatment = m.treatment,
                 censor = mRound.censor,
                 estimator = c("GFORMULA","IPTW","AIPTW"),
                 data = dt, times = c(5:10), 
                 cause = 1, se = TRUE)
)
ateRobust2$meanRisk$estimate - ateRobust3$meanRisk$estimate ## inaccuracy

## End(Not run)

###################################
#### time-dependent covariates ####
###################################

## Not run: 
#### example 1 (survival)
## data management: split trajectories
vet2 <- survSplit(Surv(time, status) ~., data = veteran,
                  cut=c(60, 120), episode ="timegroup")

## fit Cox model
fitTD <- coxph(Surv(tstart, time, status) ~ celltype + karno + age + trt,
               data= vet2, x = TRUE)

## run ATE with bootstrap for uncertainty quantification
set.seed(16)
resVet <- ate(fitTD, treatment = "celltype", times = 5, data = vet2,
              formula=Hist(entry=tstart,time=time,event=status)~1,
              landmark = c(0,30,60,90), B = 50)
summary(resVet)

## for reference
fitRef <- coxph(Surv(time, status) ~ celltype + karno + age + trt,
               data= veteran, x = TRUE)
ateRef <- ate(fitRef, treatment = "celltype", times = c(5,35), data = veteran)
## exactly the same when landmark = 0
confint(ateRef)$meanRisk[1] 
confint(resVet)$meanRisk[1]
## very different with landmark != 0
## since one condition on being alive at the landmark time
confint(ateRef)$meanRisk[2] 
suppressWarnings(confint(resVet)$meanRisk[2])

## End(Not run)

## Not run: 
#### example 2 (competing risks)
## generate data 
set.seed(137)
d <- sampleDataTD(127)
d[,status:=1*(event==1)]
d[,X3:=as.factor(X3)]
## fit cause specific Cox model
cscTD <- CSC(Hist(time=time, event=event,entry=start) ~ X3+X5+X6+X8, data=d)
## run ATE
set.seed(16)
resTD <- ate(cscTD,formula=Hist(entry=start,time=time,event=event)~1,
        data = d, treatment = "X3", contrasts = NULL,
        times=.5,verbose=1,
        landmark = c(0,0.5,1), cause = 1, B = 20, se = 1,
        band = FALSE, mc.cores=1)
resTD

## End(Not run)

tagteam/riskRegression documentation built on Oct. 19, 2024, 7:43 p.m.