ctmleGeneral: General Template for Collaborative Targeted Maximum...

Description Usage Arguments Value Examples

View source: R/ctmle_general.R

Description

This function computes the Collaborative Targeted Maximum Likelihood Estimator.

Usage

1
2
3
4
5
ctmleGeneral(Y, A, W, Wg = W, Q, ctmletype, gn_candidates,
  gn_candidates_cv = NULL, alpha = 0.995, family = "gaussian",
  gbound = 0.025, like_type = "RSS", fluctuation = "logistic",
  verbose = FALSE, detailed = FALSE, PEN = FALSE, g1W = NULL,
  g1WPrev = NULL, V = 5, folds = NULL, stopFactor = 10^6)

Arguments

Y

continuous or binary outcome variable

A

binary treatment indicator, 1 for treatment, 0 for control

W

vector, matrix, or dataframe containing baseline covariates for Q bar

Wg

vector, matrix, or dataframe containing baseline covariates for propensity score model (defaults to W if not supplied by user)

Q

n by 2 matrix of initial values for Q0W, Q1W in columns 1 and 2, respectively. Current version does not support SL for automatic initial estimation of Q bar

ctmletype

1 or 3. Type of general C-TMLE. Type 1 uses cross-validation to select best gn, while Type 3 directly solves extra clever covariates.

gn_candidates

matrix or dataframe, each column stand for a estimate of propensity score. Estimate in the column with larger index should have smaller empirical loss

gn_candidates_cv

matrix or dataframe, each column stand for a the cross-validated estimate. For example, the (i,j)-th element is the predicted propensity score by j-th estimator, for i-th observation, when it is in the validation set

alpha

used to keep predicted initial values bounded away from (0,1) for logistic fluctuation, 0.995 (default)

family

family specification for working regression models, generally 'gaussian' for continuous outcomes (default), 'binomial' for binary outcomes

gbound

bound on P(A=1|W), defaults to 0.025

like_type

'RSS' or 'loglike'. The metric to use for forward selection and cross-validation

fluctuation

'logistic' (default) or 'linear', for targeting step

verbose

print status messages if TRUE

detailed

boolean number. If it is TRUE, return more detailed results

PEN

boolean. If true, penalized loss is used in cross-validation step

g1W

Only used when type is 3. a user-supplied propensity score estimate.

g1WPrev

Only used when type is 3. a user-supplied propensity score estimate, with small fluctuation compared to g1W.

V

Number of folds. Only used if folds is not specified

folds

The list of indices for cross-validation step. We recommend the cv-splits in C-TMLE matchs that in gn_candidate_cv

stopFactor

Numerical value with default 1e6. If the current empirical likelihood is stopFactor times larger than the best previous one, the construction would stop

Value

best_k the index of estimate that selected by cross-validation

est estimate of psi_0

CI IC-based 95

pvalue pvalue for the null hypothesis that Psi = 0

likelihood sum of squared residuals, based on selected estimator evaluated on all obs or, logistic loglikelihood if like_type != "RSS"

varIC empirical variance of the influence curve adjusted for estimation of g

varDstar empirical variance of the influence curve

var.psi variance of the estimate

varIC.cv cross-validated variance of the influence curve

penlikelihood.cv penalized cross-validatedlikelihood

cv.res all cross-validation results for each fold

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
N <- 1000
p = 100
V = 5
Wmat <- matrix(rnorm(N * p), ncol = p)
gcoef <- matrix(c(-1,-1,rep(-(3/((p)-2)),(p)-2)),ncol=1)

W <- as.data.frame(Wmat)
g <- 1/(1+exp(Wmat%*%gcoef / 3))
A <- rbinom(N, 1, prob = g)

# Build potential outcome pairs, and the observed outcome Y
beta1 <- 4+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
beta0 <- 2+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]

tau = 2
sigma <- 1
epsilon <-rnorm(N,0,sigma)
Y  <- beta0 + tau * A + epsilon
# Initial estimate of Q
Q <- cbind(rep(mean(Y[A == 1]), N), rep(mean(Y[A == 0]), N))

folds <-by(sample(1:N,N), rep(1:V, length=N), list)

lasso_fit <- cv.glmnet(x = as.matrix(W), y = A, alpha = 1, nlambda = 100, nfolds = 10)
lasso_lambdas <- lasso_fit$lambda[lasso_fit$lambda <= lasso_fit$lambda.min][1:5]
# Build template for glmnet
SL.glmnet_new <- function (Y, X, newX, family, obsWeights, id, alpha = 1,
                          nlambda = 100, lambda = 0,...)
{
    # browser()
    if (!is.matrix(X)) {
          X <- model.matrix(~-1 + ., X)
         newX <- model.matrix(~-1 + ., newX)
   }
   fit <- glmnet::glmnet(x = X, y = Y,
                         lambda = lambda,
                         family = family$family, alpha = alpha)
   pred <- predict(fit, newx = newX, type = "response")
     fit <- list(object = fit)
   class(fit) <- "SL.glmnet"
   out <- list(pred = pred, fit = fit)
   return(out)
}

# Use a sequence of LASSO estimators to build gn sequence:
SL.cv1lasso <- function (... , alpha = 1, lambda = lasso_lambdas[1]){
   SL.glmnet_new(... , alpha = alpha, lambda = lambda)
}

SL.cv2lasso <- function (... , alpha = 1, lambda = lasso_lambdas[2]){
    SL.glmnet_new(... , alpha = alpha, lambda = lambda)
}

SL.cv3lasso <- function (... , alpha = 1, lambda = lasso_lambdas[3]){
    SL.glmnet_new(... , alpha = alpha, lambda = lambda)
}

SL.cv4lasso <- function (... , alpha = 1, lambda = lasso_lambdas[4]){
     SL.glmnet_new(... , alpha = alpha, lambda = lambda)
}

SL.library = c('SL.cv1lasso', 'SL.cv2lasso', 'SL.cv3lasso', 'SL.cv4lasso', 'SL.glm')

# Build the sequence. See more details in the function build_gn_seq, and package SuperLearner
gn_seq <- build_gn_seq(A = A, W = W, SL.library = SL.library, folds = folds)


# Use the output of build_gn_seq for ctmle general templates
ctmle_fit <- ctmleGeneral(Y = Y, A = A, W = W, Q = Q, ctmletype = 1,
                     gn_candidates = gn_seq$gn_candidates,
                     gn_candidates_cv = gn_seq$gn_candidates_cv,
                     folds = gn_seq$folds, V = length(folds))

ctmle documentation built on Dec. 16, 2019, 1:19 a.m.