sleete: Super Learner for Efficient Estimation of Treatment Effects...

Description Usage Arguments Details Value References See Also Examples

View source: R/methods.R

Description

The sleete function uses a super learner to minimize the variance of an augmented estimator of a specified treatment effect measure in a randomized clinical trial. It returns a matrix of point estimates and standard errors for the super learner as well as individual algorithms in the super learner library, with or without sample splitting.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
sleete(
  y,
  t,
  X,
  pi = mean(t),
  bounds = c(-Inf, Inf),
  method = mean.diff,
  ...,
  SL.library = c("SL.glm", "SL.gam", "SL.rpart", "SL.randomForest"),
  cv = 5,
  cf = 5
)

Arguments

y

Outcome data represented as a vector (for a univariate outcome) or a matrix (for a right-censored survival outcome or multiple outcomes to be analyzed together). For a right-censored survival outcome, y is a matrix with two columns: observed time followed by event type (1 failure; 0 censoring).

t

A vector of 1s and 0s representing treatment assignment. The values 1 and 0 represent the experimental and control treatments, respectively. The length of t should be equal to the number of subjects.

X

A matrix of baseline covariates that may be related to y in one or both treatment groups. The number of rows in X should be equal to the number of subjects. The number of columns in X is the number of covariates. There is no need to have a column of 1s in X.

pi

The probability of receiving the experimental treatment, usually known in a randomized clinical trial. If missing, will be replaced by the proportion of study subjects who were assigned to the experimental treatment.

bounds

Known lower and upper bounds, if any, for the treatment effect measure to be estimated. For example, if the effect measure is a difference between two probabilities, the natural bounds are c(-1,1).

method

A list of two mandatory components and one optional component specifying the (unadjusted) method for estimating the treatment effect of interest. The two mandatory components are pt.est, a function for obtaining a point estimate, and inf.fct.avail, an indicator for the availability of a function to compute the influence function of the point estimator analytically. If the value of inf.fct.avail is TRUE, one has to also supply a function named inf.fct to compute the influence function of the point estimator analytically. If the value of inf.fct.avail is FALSE, the function inf.fct is not needed and the empirical influence function (Zhang et al., 2020) will be computed. See Details for information about the built-in methods.

...

If specified, such optional arguments will be fed into the specified method. For instance, the wmw and wmw.cens methods involve a kernel function h. The default for h (named h0) and an illustrative alternative h1 are provided below as examples. The wmw.cens, surv.diff and mrst.diff methods require specifying a time point tau, which has no default value and must be supplied by the user.

SL.library

A character vector of SuperLearner wrapper functions for the prediction algorithms that comprise the super learner library. A full list of wrapper functions included in the SuperLearner package can be found with listWrappers().

cv

The number of folds in the cross-validation for the super learner.

cf

The number of folds in the sample splitting or cross-fitting procedure

Details

Currently, there are eight built-in methods available for method. Four of them are for fully observed univariate outcomes: mean.diff for the difference between two means or proportions, log.ratio for the log-ratio of two means or proportions, log.odds.ratio for the log-odds-ratio of two proportions, and wmw for the Wilcoxon-Mann-Whitney (WMW) effect (Zhang et al., 2019), the default version of which is also known as the win-lose probability difference. There are four other methods for right-censored survival outcomes: wmw.cens for the WMW effect for restricted survival times, surv.diff for the difference between two survival probabilities, mrst.diff (or rmst.diff) for the difference in mean restricted survival time, and log.haz.ratio for the log-hazard-ratio. The methods for right-censored survival outcomes are implemented without an analytical influence function (i.e., inf.fct.avail=FALSE). Users can define their own methods under the same guidelines. For illustration, the current definitions of the log.odds.ratio and wmw methods are provided below as examples.

Value

A matrix with two columns: point estimates of the treatment effect of interest and their standard errors. The number of rows is 2K+3, where K is the length of SL.library. The first row is for the unadjusted estimate as specified in the method argument. The next K+1 rows are for augmented estimates based on the individual algorithms in the super learner library (in the original order) followed by the super learner itself, all without sample splitting. The next K+1 rows are for augmented estimates based on the same set of algorithms (in the same order) with sample splitting. The standard error for the unadjusted estimate is based on the (analytical or empirical) influence function. The standard errors for the augmented estimates are cross-validated in the sample splitting procedure. Thus, the two sub-sets of augmented estimates (with and without sample splitting) have the same set of cross-validated standard errors.

References

Zhang Z, Ma S (2019). Machine learning methods for leveraging baseline covariate information to improve the efficiency of clinical trials. Statistics in Medicine, 38(10), 1703-1714.

Zhang Z, Ma S, Shen C, Liu C (2019). Estimating Mann-Whitney-type causal effects. International Statistical Review, 87(3), 514-530.

Zhang Z, Li W, Zhang H (2020). Efficient estimation of Mann-Whitney-type effect measures for right-censored survival outcomes in randomized clinical trials. Statistics in Biosciences, 12(2), 246-262.

See Also

See SuperLearner for details on SL.library, and family.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# analysis of colon cancer data in the survival package
library(survival)
library(sleete)
data(colon)
dim(colon); names(colon)
colon.data <- na.omit(subset(colon, subset=((etype==2)&(rx!="Lev")),
select = c(rx, time, status, sex, age, obstruct, perfor,
  adhere, nodes, node4, surg, differ, extent)))
dim(colon.data)
attach(colon.data)
t = as.numeric(rx=="Lev+5FU")
y = cbind(time, status)
X = cbind(sex, age, obstruct, perfor, adhere, nodes, node4, surg, differ, extent)
detach()
pi = 0.5; tau = 5*365
sleete(y, t, X, pi=pi, method=surv.diff, bounds=c(-1,1), tau=tau)
sleete(y, t, X, pi=pi, method=mrst.diff, tau=tau)
sleete(y, t, X, pi=pi, method=wmw.cens, bounds=c(-1,1), tau=tau)

# the log-odds-ratio method
# logit = log-odds
logit = function(p) log(p/(1-p))
# point estimate
pt.est.log.or = function(y, t) logit(mean(y[t>0.5]))-logit(mean(y[t<0.5]))
# influence function estimated from subjects in set I
# then applied to subjects in set J
inf.fct.log.or = function(y, t, I=1:length(t), J=I, pi=NULL) {
  if (is.null(pi)) pi = mean(t[I])
  p1 = mean(y[I][t[I]>0.5]); p0 = mean(y[I][t[I]<0.5])
  (t[J]*(y[J]-p1)/(pi*p1*(1-p1)))-((1-t[J])*(y[J]-p0)/((1-pi)*p0*(1-p0)))
}
log.odds.ratio = list(pt.est=pt.est.log.or, inf.fct.avail=TRUE, inf.fct=inf.fct.log.or)

# the wmw method with an arbitrary h (default = h0)
# Agresti definition of h
h0 = function(y1, y0) as.numeric(y1>y0)-as.numeric(y1<y0)
# Mann-Whitney definition of h
h1 = function(y1, y0) as.numeric(y1>y0)+0.5*as.numeric(y1==y0)
# point estimate
pt.est.wmw = function(y, t, h=h0) mean(outer(y[t>0.5], y[t<0.5], FUN=h))
# influence function estimated from subjects in set I
# then applied to subjects in set J
inf.fct.wmw = function(y, t, I=1:length(t), J=I, pi=NULL, h=h0) {
  if (is.null(pi)) pi = mean(t[I])
  theta = pt.est.wmw(y[I],t[I],h=h)
  m = length(J); inf = numeric(m)
  for (k in 1:m) {
    if (t[J[k]]>0.5) {
      inf[k] = (mean(h(y[J[k]],y[I]))-theta)/pi
    } else {
      inf[k] = (mean(h(y[I],y[J[k]]))-theta)/(1-pi)
    }
  }
  inf
}
wmw = list(pt.est=pt.est.wmw, inf.fct.avail=TRUE, inf.fct=inf.fct.wmw)

czhang2718/sleete documentation built on July 1, 2020, 12:10 a.m.