Description Usage Arguments Value Examples
View source: R/ctmle_discrete.R
This function computes the discrete Collaborative Targeted Minimum-loss based Estimator for variable selection. It includes the greedy C-TMLE algorithm (Gruber and van der Laan 2010), and scalable C-TMLE algorithm (Ju, Gruber, and Lendle et al. 2016) with a user-specified order.
| 1 2 3 4 5 6 | ctmleDiscrete(Y, A, W, Wg = W, Q = NULL, preOrder = FALSE, order = NULL,
  patience = FALSE, Qbounds = NULL, cvQinit = FALSE, Qform = NULL,
  SL.library = NULL, alpha = 0.995, family = "gaussian", gbound = 0.025,
  like_type = "RSS", fluctuation = "logistic", verbose = FALSE,
  detailed = FALSE, PEN = FALSE, V = 5, folds = NULL,
  stopFactor = 10^6)
 | 
| Y | continuous or binary outcome variable | 
| A | binary treatment indicator, 1 for treatment, 0 for control | 
| W | vector, matrix, or dataframe containing baseline covariates for Q bar | 
| Wg | vector, matrix, or dataframe containing baseline covariates for propensity score model (defaults to W if not supplied by user) | 
| Q | n by 2 matrix of initial values for Q0W, Q1W in columns 1 and 2, respectively. Current version does not support SL for automatic initial estimation of Q bar | 
| preOrder | boolean indicator for using scalable C-TMLE algorithm or not | 
| order | the use-specified order of covariables. Only used when (preOrder = TRUE). If not supplied by user, it would automatically order covariates from W_1 to W_p | 
| patience | a number to stop early when the score in the CV function does not improve after so many covariates. Used only when (preOrder = TRUE) | 
| Qbounds | bound on initial Y and predicted values for Q. | 
| cvQinit | if TRUE, cross-validate initial values for Q to avoid overfits | 
| Qform | optional regression formula for estimating initial Q | 
| SL.library | optional vector of prediction algorithms for data adaptive estimation of Q, defaults to glm, and glmnet | 
| alpha | used to keep predicted initial values bounded away from (0,1) for logistic fluctuation, 0.995 (default) | 
| family | family specification for working regression models, generally 'gaussian' for continuous outcomes (default), 'binomial' for binary outcomes | 
| gbound | bound on P(A=1|W), defaults to 0.025 | 
| like_type | 'RSS' or 'loglike'. The metric to use for forward selection and cross-validation | 
| fluctuation | 'logistic' (default) or 'linear', for targeting step | 
| verbose | print status messages if TRUE | 
| detailed | boolean number. If it is TRUE, return more detailed results | 
| PEN | boolean. If true, penalized loss is used in cross-validation step | 
| V | Number of folds. Only used if folds is not specified | 
| folds | The list of indices for cross-validation step. We recommend the cv-splits in C-TMLE matchs that in gn_candidate_cv | 
| stopFactor | Numerical value with default 1e6. If the current empirical likelihood is stopFactor times larger than the best previous one, the construction would stop | 
best_k the index of estimate that selected by cross-validation
est estimate of psi_0
CI IC-based 95
pvalue pvalue for the null hypothesis that Psi = 0
likelihood sum of squared residuals, based on selected estimator evaluated on all obs or, logistic loglikelihood if like_type != 'RSS'
varIC empirical variance of the influence curve adjusted for estimation of g
varDstar empirical variance of the influence curve
var.psi variance of the estimate
varIC.cv cross-validated variance of the influence curve
penlikelihood.cv penalized cross-validated likelihood
cv.res all cross-validation results for each fold
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | ## Not run: 
N <- 1000
p = 10
Wmat <- matrix(rnorm(N * p), ncol = p)
beta1 <- 4+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
beta0 <- 2+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
tauW <- 2
tau <- 2
gcoef <- matrix(c(-1,-1,rep(-(3/((p)-2)),(p)-2)),ncol=1)
Wm <- as.matrix(Wmat)
g <- 1/(1+exp(Wm%*%gcoef))
A <- rbinom(N, 1, prob = g)
sigma <- 1
epsilon <-rnorm(N,0,sigma)
Y  <- beta0 + tauW*A + epsilon
# Initial estimate of Q
Q <- cbind(rep(mean(Y[A == 0]), N), rep(mean(Y[A == 1]), N))
# User-suplied initial estimate
time_greedy <- system.time(
ctmle_discrete_fit1 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat), Q = Q,
                                    preOrder = FALSE)
)
# If there is no input Q, then intial Q would be estimated by SL with Sl.library
ctmle_discrete_fit2 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat),
                                    preOrder = FALSE, detailed = TRUE)
# scalable C-TMLE with pre-order option; order is user-specified,
# If 'order' is  not specified takes order from W1 to Wp.
time_preorder <- system.time(
ctmle_discrete_fit3 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat), Q = Q,
                                    preOrder = TRUE,
                                    order = rev(1:p), detailed = TRUE)
)
# Compare the running time
time_greedy
time_preorder
## End(Not run)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.