tmleCommunity: Estimate Marginal Treatment Effects For Arbitrary...
In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data

Description Usage Arguments Details Value IPTW estimator TMLE estimator GCOMP estimator Modeling P(A | W, E) for covariates (A, W, E) Three methods for choosing bin (interval) locations for a univariate and continuous variable A See Also Examples

View source: R/tmleCommunity.R

Estimate the marginal treatment effect among independent communities (or i.i.d units if no hierarchical structure) using TMLE (targeted maximum likelihood estimation). It supports two different TMLEs that are based on community-level and individual-level analysis, respectively. The individual-level TMLE cooperates with additional working assumptions and has potential efficiency gain. It also provide corresponding IPTW (the inverse-probability-of-treatment or Horvitz-Thompson) and GCOMP (parametric G-computation).

tmleCommunity(data, Ynode, Anodes, WEnodes, YnodeDet = NULL,
  obs.wts = c("equal.within.pop", "equal.within.community"),
  community.step = c("NoCommunity", "community_level",
  "individual_level", "perCommunity"), communityID = NULL,
  community.wts = c("size.community", "equal.community"),
  pooled.Q = FALSE, f_g0 = NULL, f_gstar1, f_gstar2 = NULL,
  Qform = NULL, Qbounds = NULL, alpha = 0.995,
  fluctuation = "logistic", hform.g0 = NULL, hform.gstar = NULL,
  lbound = 0.005, h.g0_GenericModel = NULL,
  h.gstar_GenericModel = NULL, TMLE.targetStep = c("tmle.intercept",
  "tmle.covariate"), n_MCsims = 1, CI_alpha = 0.05, rndseed = NULL,
  verbose = getOption("tmleCommunity.verbose"))

`data`	Observed data, `data.frame` with named columns, containing `WEnodes`, `Anode`, `Ynode` and possibly `communityID`, `YnodeDet`. See "Details".
`Ynode`	Column name or index in `data` of outcome variable. Outcome can be either binary or continuous (could be beyond 0 and 1). If Ynode undefined, the left-side of the regression formula in argument `Qform` will be treated as `Ynode`.
`Anodes`	Column names or indices in `data` of exposure (treatment) variables
`WEnodes`	Column names or indices in `data` of individual-level (and possibly community-level) baseline covariates. Factors are not allowed.
`YnodeDet`	Optional column name or index in `data` of indicators of deterministic values of outcome `Ynode`, coded as (TRUE / FALSE) or (1 / 0). If TRUE or 1, value of `Ynode` is given deterministically / constant.
`obs.wts`	Optional choice to provide/ construct a vector of individual-level observation (sampling) weights (of length `nrow(data)`). Currently supports a non-negative numeric vector, "equal.within.pop" (Default) and equal.within.community. If "equal.within.pop", weigh individuals in the entire dataset equally (weigh to be all 1); If "equal.within.community", weigh individuals within the same community equally (i.e., 1 / (number of individuals in each community)).
`community.step`	Methods to deal with hierarchical data, one of `"NoCommunity"` (Default), `"community_level"`, `"individual_level"` and `"PerCommunity"`. If `"NoCommunity"`, claim that no hirerachical structure in data; If `"community_level"`, use community-level TMLE; If `"individual_level"`, use individual-level TMLE cooperating with the assumption of no covariate interference. If `"perCommunity"`, use stratified TMLE. If `communityID` = `NULL`, then automatically pool over all communities (i.e., treated it as `"NoCommunity"`). See "Details".
`communityID`	Optional column name or index in `data` of community identifier variable. If known, it can support the three options within `community.step`: `"community_level"`, `"individual_level"` and `"PerCommunity"` (See details for `community.step`).
`community.wts`	Optional choice to provide/ construct a matrix of community-level observation weights (where dimension = J\times2, where J = the number of communities). The first column contains the identifiers / names of communities (ie., `data[, communityID]`) and the second column contains the corresponding non-negative weights. Currently only support a numeric matrix with 2 columns, "size.community" (Default) and "equal.community". If setting `community.wts` = "size.community", treat the number of individuals within each community as its weight, respectively. If `community.wts` = "equal.community", assumed weights to be all 1;
`pooled.Q`	Logical for incorporating hierarchical data to estimate the outcome mechanism. If `TRUE`, use a pooled individual-level regression for initial estimation of the mean outcome (i.e., outcome mechanism). Default to be `FASLE`. See "Details".
`f_g0`	Optional function used to specify model knowledge about value of Anodes. It estimates P(A \| W, E) under `g0` by sampling a large vector/ data frame of Anode (of length `nrow(data)*n_MCsims` or number of rows if a data frame) from `f_g0` function.
`f_gstar1`	Either a function or a vector or a matrix/ data frame of counterfactual exposures, dependin on the number of exposure variables. If a matrix/ data frame, its number of rows must be either `nrow(data)` or 1 (constant exposure assigned to all observations), and its number of columns must be `length(Anodes)`. Note that the column names should match with the names in `Anodes`. If a vector, it must be of length `nrow(data)` or 1. If a function, it must return a vector or a data frame of counterfactual exposures sampled based on Anodes, WEnodes (and possibly communityID) passed as a named argument "data". Thus, the function must include "data" as one of its argument names. The interventions defined by f_gstar1 can be static, dynamic or stochastic. See Exmaples below.
`f_gstar2`	Either a function or a vector or a matrix/ data frame of counterfactual exposures, dependin on the number of exposure variables. It has the same components and requirements as f_gstar1 has.
`Qform`	Character vector of regression formula for Ynode. If not specified (i.e., `NULL`), the outcome variable is regressed on all covariates included in Anodes and WEnodes (i.e., `Ynode ~ Anodes + WEnodes`). See "Details".
`Qbounds`	Vector of upper and lower bounds on Y and predicted value for initial Q. Default to the range of Y, widened by 10% of the min and max values. See "Details".
`alpha`	Used to keep predicted values for initial Q bounded away from (0,1) for logistic fluctuation (set `Qbounds` to (1 - `alpha`), `alpha`).
`fluctuation`	Default to "logistic", it could also be "linear" (for targeting step).
`hform.g0`	Character vector of regression formula for estimating the conditional density of P(A \| W, E) under observed treatment mechanism g0. If not specified, its form will be `Anodes ~ WEnodes`. If there are more than one expsosure, it fits a joint probability. See section "Modeling P(A \| W, E) for covariates (A, W, E)".
`hform.gstar`	Character vector of regression formula for estimating the conditional density P(A \| W, E) under user-supplied interventions f_gstar1 or f_gstar2. If not specified, it use the same regression formula as used in hform.g0.
`lbound`	Value between (0,1) for truncation of predicted P(A \| W, E). Default to 0.005
`h.g0_GenericModel`	An object of `GenericModel` R6 class containing the previously fitted models for P(A \| W, E) under observed treatment mechanism g0, one of the returns of `tmleCommunity` function. If known, predictions for P(A=a \| W=w, E=e) under g0 are based on the fitted models in `h.g0_GenericModel`.
`h.gstar_GenericModel`	An object of `GenericModel` R6 class containing the previously fitted models for P(A^* \| W, E) under intervention gstar, one of the returns of `tmleCommunity` function. If known, predictions for P(A=a \| W=w, E=e) under gstar are based on the fitted models in `h.gstar_GenericModel`.
`TMLE.targetStep`	TMLE targeting step method, either "tmle.intercept" (Default) or "tmle.covariate". See "Details".
`n_MCsims`	Number of simulations for Monte-Carlo analysis. Each simulation generates new exposures under f_gstar1 or f_gstar2 (if specified) or f_g0 (if specified), with a sample size of nrow(data). Then these generated expsosures are used when fitting the conditional densities P(A \| W, E). and estimating for IPTW and GCOMP under intervention f_gstar1 or f_gstar2. Note that deterministic intervention only needs one simulation and stochastic intervention could use more simulation times such as 10 (Default to 1).
`CI_alpha`	Significance level (alpha) used in constructing a confidence interval. Default to 0.05.
`rndseed`	Random seed for controlling sampling A under f_gstar1 or f_gstar2 (for reproducibility of Monte-Carlo simulations)
`verbose`	Flag. If `TRUE`, print status messages. Default to `getOption("tmleCommunity.verbose")` (Default to `FALSE`). It can be turned on by setting `options(tmleCommunity.verbose = TRUE)`.

The estimates returned by tmleCommunity are of a treatment-specific mean, E[Y_{g^*}], the expected community-level outcome if all communities in the target population received the exposures generated under the user-supplied (deterministic or stochastic) intervention g^*.

data must be a data frame (no matrix accepted) and does not support factor values (considering removing or recording such columns as strings), data includes the following (optional) columns:

community-level and individual-level baseline covariate columns (WEnodes): can be any numeric data. Notice that W represents individual-level covariates and E represent community-level covariates.
exposure columns (Anodes): can have more than one exposure and each exposure could be can be either binary, categorical or continuous.
outcome column (Ynode): can be any numeric data. If Ynode values are continuous, they may be automatically scaled. See details for Qbounds below.
deterministic Ynode indicator column (YnodeDet): (optional) column that must be logical or binary. Only non-determinstic Ynode values will be used in the final estimation step (e.g., IPTW[!determ.Q] <- Y[!determ.Q] * h_wts[!determ.Q]).
community identifier variable column (communityID): (optional) column that stores community identifier for hierarchical data (or subject identifier if multiple observations for the same individual). Integer or character recommended (No factor allowed). NULL means no hierarchical structure and all distinct individuals.

community.step specifies how to read the structure of the input data (as hierarchical or non-hierarchical) and how to analyze the data. community_level TMLE ("community_level") is exactly analogous to the TMLE of the treatment specific individual-level mean outcome ("NoCommunity") with the trivial modification that the community rather than the individual serves as the unit of analysis. communityID is needed when using "community_level", "individual_level" and "perCommunity". Lack of communityID forces the algorithm to automatically pool data over all communities and treat it as non-hierarchical dataset (so force community.step = "NoCommunity"). If community.step = "individual_level", it incorporates with working models that assume that each individual's outcome is known not to be affected by the covariates of other individuals in the same community (i.e., "no covariate interference"). This strong assumption can be relaxed by integrating knowledge of the dependence structure among individuals within communities (i.e., "weak covariate interference"). But currrently only the "no covariate interference" assumption is implemented. If community.step = "perCommunity"), run a single TMLE on each community and calculate a (weighted) mean outcome for the J communities.

pooled.Q is in regard to incorporate the working model of "no covariate inference" in community-level TMLE (and the corresponding IPTW and GCOMP) although the working model is not assumed to hold. In other words, when community.step = "community_level", if pooled.Q = TRUE, add pooled individual-level regressions as candidates in the Super Learner library for initial estimation of the outcome mechanism. If pooled.Q = FALSE, both outcome and treatment mechanisms are estimated on the community-level (no use of individual-level information).

Qform should be NULL, in which cases all parent nodes of Y node will be used as regressors, or a character vector that can be coerced to class "formula". If Qestimator (an argument in tmleCom_Options) is "speedglm__glm" (or "glm__glm"), then speedglm (or glm) will be called using the components of Qform. If Qestimator is "SuperLearner", then SuperLearner will be called after a data frame is created using Qform, based on the specified algorithms in SL.library (an argument in tmleCom_Options); If Qestimator is "h2o__ensemble", then h2o and h2oEnsemble will be called after a H2OFrame dataset is creating using Qform, based on specified algorithms in h2olearner and h2ometalearner. See "Arguments" in tmleCom_Options.

hform.g0 and hform.gstar should also be NULL, in which cases all parent nodes of A node(s) will be used as regressors, or a character vector that can be coerced to class "formula". It follows the same rules applied to Qform except it's now based on gestimator (an argument in tmleCom_Options). Besides, gestimator can be "sl3_pipelines". If gestimator is "sl3_pipelines", then learner$train will be called after a sl3 learner has been created by using make_learner. See "Arguments" in tmleCom_Options.

Qbounds can be used to bound continuous outcomes Y. If Qbounds not specified (NULL), it will be set to the range of Y, widened by 10% of the minimum and maximum. That is, [0.9*min(Y), 1.1*max(Y)]. If specified, then Y will be truncated at the min max values of Qbounds, and then scaled to be in [0, 1] by (Y - min(Qbound))/(diff(Qbound)). Statistical inferences and for the transformed outcome can be directly translated back into statistical inference for the unscaled outcome. Once Qbounds finish bounding the observed outcomes, it will be set to (1 - alpha, alpha) and used to bound the predicted values for the initial outcome mechanism. Thus, alpha needs to be between (0, 1), otherwise reset to 0.995. Besides, lbound can be used to truncate the weights h_gstar/h_gN, that is, [0, 1/lbound].

TMLE.targetStep specifies how to use weights h_gstar/h_gN in the TMLE targeting step. If tmle.intercept (default), it performs the weighted intercept-based TMLE that runs a intercept-only weighted logistic regression using offsets logit(Qstar) and weights h_gstar/h_gN and so no covariate. If tmle.covariate, it performs the unweighted covariate-based TMLE that run a unweighted logistic regression using offsets logit(Qstar) and a clever covariate h_gstar/h_gN.

tmleCommunity returns an object of class "tmleCommunity", which is a named list containing the estimation results stored by the following 3 elements:

EY_gstar1 - a list with estimates of the mean counterfactual outcome under (deterministic or stochastic) intervention function f_gstar1 (E[Y_{g^*_1}]).
EY_gstar2 - a list with estimates of the mean counterfactual outcome under (deterministic or stochastic) intervention function f_gstar2 (E[Y_{g^*_2}]); or NULL if f_gstar2 not specified.
ATE - a list with estimates of additive treatment effect (E[Y_{g^*_1}] - E[Y_{g^*_2}]) under two interventions functions f_gstar1 VS f_gstar2; or NULL if f_gstar2 not specified.

Each element in the returned tmleCommunity object is itself a list containing the following 8 items:

estimates - matrix, 3\times1, storing 3 algorithm estimates of the target parameter (population community-level counterfactual mean under (deterministic or stochastic) intervention), including TMLE, IPTW and GCOMP.
vars - matrix, 3\times1, storing 3 influence-curve based asymptotic variance estimates for TMLE, IPTW and GCOMP. Notice, all IC-based statistical inference for GCOMP is not accurate (Just for reference). See explanation in IC.
CIs - matrix, 3\times2, storing 3 confidence interval estimates at CI_alpha level, for TMLE, IPTW and GCOMP. The first column contains the lower bounds and the second column contains the upper bounds.
tstat - matrix, 3\times1, storing 3 test statistics.
pval - matrix, 3\times1, storing 3 p-values. It's based on the Student's T distribution if the number of communities (or the number of individuals if no hierarchical structure) is less than 41, otherwise based on the Z normal distribution.
IC - data frame, nobs\times3, the first column contains the influence curves (ICs) for TMLE estimate, the second column contains the ICs for IPTW estimate, and the third column contains the ICs for GCOMP estimate (not accurate since it is based on ICs for TMLE estimate without updating step).
h.g0_GenericModel - An object of GenericModel R6 class, storing the model fits for P(A | W, E) under observed exposure mechanism g0. This can be used in tmleCommunity (See Arguments).
h.gstar_GenericModel - An object of GenericModel R6 class, storing the model fits for P(A | W, E) under intervention f_gstar1 or f_gstar2. This can be used in tmleCommunity (See Arguments).

Estimations are based on either community-level or individual-level analysis. Each analysis currently implements 3 estimators:

tmle - Either weighted intercept-based TMLE based on weights h_gstar/h_gN (default choice) or unweighted covariate-based TMLE based on a covariate h_gstar/h_gN.
iptw - IPTW (Horvitz-Thompson) estimator based on the TMLE weights h_gstar/h_gN.
gcomp - Maximum likelihood based G-computation substitution estimator.

The IPTW estimator is based on the TMLE weights h_{g^*}(A^*,W,E)/h_{g}(A,W,E), which is equivalent to P_{g^*}(A^*|W,E)/P_{g}(A|W,E) and is defined as the the weighted average of the observed outcomes Y. The following algorithm shows a general template of the community-level IPTW:

As described in the following section ("Modeling P(A | W, E) for covariates (A, W, E)"), the first step is to construct an estimator P_{\hat{g}^{c}}(A | W, E) of the density for the common (in j) conditional distribution of A given W, E, that is P_{g_0^{c}}(A | W, E) for common (in j) community-level covariates (A, W, E).
Implementing the same modeling & fitting algorithm to construct an estimator P_{\hat{g}^{c*}}(A^* | W, E) of the density for the common (in j) conditional distribution of A^* given (W, E), that is P_{g_0^{c*}}(A^* | W, E) for common (in j) community-level covariates (A^*, W, E) where A^* is determined by the user-supplied stochastic intervention f_gstar1 or f_gstar2, given the observed baseline covariates (W, E).
Given observed J independent communities \textbf{O}_j = (E_j, \textbf{W}_j, A_j, Y^c_j: j = 1, ..., J), the IPTW estimator is given by:

ψ^{I}_{IPTW, n}=\frac{1}{J}∑_{j=1}^{J}Y^c_j\frac{P_{\hat{g}^{c*}}(A^*_j|\textbf{W}_j,E_j)}{P_{\hat{g}^{c*}}(A_j|\textbf{W}_j,E_j)}

For individual-level IPTW, it reads the input data as O_{i,j} = (E_j, W_{i,j}, A_j, Y_{i,j}: j = 1, ..., J; i = 1, ..., n_j) and incorporates working model that assumes no covariate interference, weighing each individual within one community by α_{i,j}, where the IPTW estimator is given by:

ψ^{II}_{IPTW, n}=\frac{1}{J}∑_{j=1}^{J}∑_{i=1}^{n_j}α_{i,j}Y_{i,j}\frac{P_{g^{*}}(A^*_j|W_{i,j}, E_j)} {P_{g^{*}}(A_j | W_{i,j}, E_j)}

The TMLE estimator is based on the updated model prediction \bar{Q}^*(A, W, E) and is defined by the G-formula. The following algorithm shows a general template of the community-level TMLE:

The first step is exactly the same as IPTW: construct two density estimators and use the ratio of them as the weights P_{g^*}(A^*|W,E)/P_{g}(A|W,E) in the targeting step.
Construct an initial estimator \hat{\bar{Q}}^c(A | W, E) of the common (in j) conditional distribution of Y^c given (A, W, E) and update \hat{\bar{Q}}^{c*}(A | W, E) for \hat{\bar{Q}}^c(A | W, E) by weights calculated in the first step.
The TMLE estimator is defined as the following substitution estimator:

ψ^{I}_{TMLE, n}=\frac{1}{J}∑_{j=1}^{J}\int_{a}\hat{\bar{Q}}^{c*}(a, \textbf{W}_j, E_j)g^{c*}(a|\textbf{W}_j, E_j)dμ(a)

For individual-level TMLE, its estimator is obtained as:

ψ^{II}_{TMLE,n}=\frac{1}{J}∑_{j=1}^{J}∑_{i=1}^{n_j}α_{i,j}\int_{a}\hat{\bar{Q}}^*(a, W_{i,j}, E_j)g^*(a|W_{i,j}, E_j)dμ(a)

The GCOMP estimator is similar to the the TMLE estimator except it uses the untargeted (initial) model \hat{\bar{Q}}^c(A|W,E) instead of its targeted version \hat{\bar{Q}}^{c*}(A | W, E), for example the community-level GCOMP estimator is given by:

ψ^{I}_{GCOMP,n}=\frac{1}{J}∑_{j=1}^{J}\int_{a}\hat{\bar{Q}}^{c}(a, \textbf{W}_j, E_j)g^{c*}(a|\textbf{W}_j, E_j)dμ(a)

For simplicity (and without loss of generality), we now suppose that there is no hierarchical structure in data and are interested in finding an non-parametric estimator of the common (in i) individual-level exposure mechanism g_0(A|W), or the commom multivariate joint conditional probability model P_{g_0}(A|W), where the exposures and baseline covariates ((A,W)=(A_i, W_i: i=1,...,n)) denote the random variables drawn jointly from distribution H_0(A, W) with denisty h_0(a, w) \equiv g_0(a|w)q_{W,0}(w) and q_{W,0}(W) denotes the marginal density of the baseline covariates W specified by the regression formula hform.g0 (Notice that an non-parametric estimator of the model P_{g^*_0}(A|W)) is similar, except that now the exposures and baseline covariates ((A^*,W)=(A^*_i, W_i: i=1,...,n)) are randomly drawn from H^*_0(A, W) with density h^*_0(a, w) \equiv g^*_0(a|w)q_{W,0}(w), where A^* is determined by the user-supplied (stochastic) intervention f_gstar1 or f_gstar2 and q_{W,0}(W) denotes the marginal density of the baseline covariates W specified by the regression formula hform.gstar. Thus, the fitting algorithm for P_{g^*_0}(A|W)) is equivalent for P_{g_0}(A|W)).

Note that A can be multivariate (i.e., (A(1), ..., A(M))) and each of its components A(m) can be either binary, categorical or continuous. The joint probability model for P(A|W)=P(A(1),...,A(M)|W) can be factorized as a sequence P(A(1)|W) \times P(A(2)|W,A(1)) \times ... \times P(A(M)|W, A(1),...,A(M-1)), where each of these conditional probability models P(A(m)|W,A(1),...,A(m-1)) is fit separately, depending on the type of the m-specific outcome variable A(m). For binary A(m), the conditional probability P(A(m)|W,A(1),...,A(m-1)) will be esimtated by a user-specific library of candidate algorithms, including parametric estimators such as logistic model with only main terms, and data-adaptive estimator such as super-learner algorithms. For continuous (or categorical) A(m), consider a sequence of values δ_1, δ_2,...,δ_{K+1} that span the range of A and define K bins and the corresponding K bin indicators (B_1,...,B_K), in which case each bin indicator B_k \equiv [δ_k, δ_{k+1}) is used as an binary outcome in a seperate user-specific library of candidate algorithms, with predictors given by (W, A(1),..., A(m-1)). That is how the joint probability P(A|W) is factorized into such an entire tree of binary regression models.

For simplicity (and without loss of generality), we now suppose A is univariate (i.e., M=1) and continuous and a general template of an fitting algorithm for P_{g_0}(A|W) is summrized below:

Consider the usual setting in which we observe n independently and identically distributed copies o_i=(w_i, a_i, y_i: i=1,..,n) of the random variable O=(W, A, Y), where the observed (a_i: i=1...,n) are continuous.
As described above, consider a sequence of K+1 values that span the support of A values into K bin intervals Δ = (δ_1, δ_2,...,δ_{K+1}) so that any observed data point a_i belongs to one interval within R, in other words, for each possible value a \in A (even if it is not in the observed (a_i:i)), there always exists a k \in {1, ...,K} such that δ_{k}≤q a<δ_{k+1}, and the length (bandwidth) of the interval can be defined as bw_{k}=δ_{k+1}-δ_{k}. Then let the mapping S(a)\in \{1,2,..,K\} denote a unique index of the indicator in Λ that a falls in, where S(a)=k if a\in [δ_{k},δ_{k+1}), namely, δ_{S(a)} ≤q a < δ_{S(a)+1}. Moreover, we use b_k to denote a binary indicator of whether the observed a belongs to bin k (i.e., b_k\equiv I(S(a)=k) for all k≤q S(a); b_k\equiv NA for all k>S(a)). This is similar to methods for censored longitudinal data, which code observations as NA (censored or missing) once the indicator b_k jumps from 0 to 1. Since a is a realization of the random variable A for one individual, the corresponding random binary indicators of whether A belongs to bin k can be denoted by B_k:k=1,..,=K where B_k \equiv I(S(A)=k) for all k≤q S(A); B_k\equivNA for all k>S(A).
Then for each k = 1,...,K, a binary nonparametric regression is used to estimate the conditional probability P(B_k=1|B_{k-1}=0,W), which corresponds to the probability of B_k jumping from 0 to 1, given B_{k-1}=0 and tbe baseline covariates W. Note tha for each k, the corresponding nonparametric regression model is fit only among observations that are uncensored (i.e., still at risk of getting B_{k}=1 with B_{k-1}=0). Note the above conditional probability P(B_k=1|B_{k-1}=0,W) is equivalent to P(A\in [δ_{k}, δ_{k+1}) | A≥q δ_{k}, W), which is the probability of A belongs to the interval [δ_{k},δ_{k+1}), conditional on A does not belong to any intervals before [δ_{k}, δ_{k+1}) and W. Then the discrete conditional hazard function for each k is defined as a normalization of the conditional probability using the corresponding interval bandwidth bw_{k}: λ_k(A,W)=\frac{P(B_k=1|B_{k-1}=0,W)}{bw_k}=\frac{P(A\in[δ_{k},δ_{k+1})|A≥q δ_{k+1},W)}{bw_k}
Finally, for any given observation (a,w), we first find out the interval index k to which a belongs (i.e., k=S(a)\in{1,...,K}). Then the discretized conditional density of P(A=a|W=w) can be evaluated by λ_k(A, W){\times}[∏_{j=1}^{k-1}(1-λ_j(A, W))], which corresponds to the conditional probability of a belongs to the interval [δ_{k},δ_{k+1}) and does not belong to any intervals before, given W.

Note that the choice of the values δ_k (k=1,..,K) implies defining the number and positions of the bins. First, a cross- validation selector can be applied to data-adaptively select the candidate number of bins, which minimizes variance and maximizes precision (Do not recommend too many bins due to easily violating the positivity assumption). Then, we need to choose the most convenient locations (cuttoffs) for the bins (for fixed K). There are 3 alternative methods that use the histogram as a graphical descriptive tool to define the bin cutoffs Δ=(δ_1,...,δ_K,δ_{K+1}) for a continuous variable A. In tmleCommunity package, the choice of methods bin.method together with the other discretization arguments in function tmleCom_Options() such as nbins (total number of bins) and maxNperBin (the maximum number of observations in each bin), can be used to define the values of bin cutoffs. See help(tmleCom_Options) for more argument details.

equal.mass: The default discretization method, equal mass (aka equal area) interval method, set by passing an argument bin.method="equal.mass" to tmleCom_Options() prior to calling the main function tmleCommunity(). The interval are defined by spanning the support of A into non-equal length of bins, each containing (approximately) the same number of observations. It is data-adaptive since it tends to be wide where the population density is small, and narrow where the density is large. If nbins is NA (or is smaller than n/maxNperBin), nbins will be (re)set to the interger value of n/maxNperBin where n is the total number of observations in A, and the default setting of maxNperBin is 500 observations per interval. This method could identify spikes in the density, but oversmooths in the tails and so could not discover outliers.
equal.len: equal length interval method, set by passing an argument bin.method="equal.len" to tmleCom_Options() prior to calling tmleCommunity(). The intervals are defined by spanning the support of A into nbins number of equal length of bins. This method describes the tails of the density and identifies outliers well, but oversmooths in regions of high density and so is poor at identifying sharp peaks.
dhist: (named for diagonally cut histogram) Combination of equal-area equal length and equal mass method, set by passing an argument bin.method="dhist" to tmleCom_Options() prior to calling tmleCommunity(). For consistency, We choose the slope a = 5\times \textrm{IQR}(A) as suggested by Denby and Mallows ("Variations on the Histogram" (2009)). For more details, please also see this paper.

tmleCommunity-package for the general description of the package,

tmleCom_Options for additional parameters that control the estimation in tmleCommunity,

DatKeepClass for details about storing, managing, subsetting and manipulating the input data,

indSample.iid.cA.cY_list for an example of a continuous exposure and its estimation,

BinaryOutModel, RegressionClass, GenericModel, MonteCarloSimClass

## Not run: 
#***************************************************************************************
# Example 1: Hierarchical example, with one binary A and bianry Y 
# True ATE of the community-based treatment is approximately 0.103716
data(comSample.wmT.bA.bY_list)  # load the sample data 
comSample.wmT.bA.bY <- comSample.wmT.bA.bY_list$comSample.wmT.bA.bY
N <- NROW(comSample.wmT.bA.bY)
Qform.corr <- "Y ~ E1 + E2 + W2 + W3 + A" # correct Q form
gform.corr <- "A ~ E1 + E2 + W1"  # correct g
#***************************************************************************************

#***************************************************************************************
# 1.1 Estimating the additive treatment effect (ATE) for two deterministic interventions
# (f_gstar1 = 1 vs f_gstar2 = 0) via community-level / individual-level analysis.
# speed.glm using correctly specified Qform, hform.g0 and hform.gstar;
#***************************************************************************************
# Setting global options that may be used in tmleCommunity(), e.g., using speed.glm
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "speedglm__glm", maxNperBin = N)

# Community-level analysis without a pooled individual-level regression on outcome
tmleCom_wmT.bA.bY.1a_sglm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = FALSE, 
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)

# Examples of estimates under f_gstar1 = 1:
tmleCom_wmT.bA.bY.1a_sglm$EY_gstar1$estimates
tmleCom_wmT.bA.bY.1a_sglm$EY_gstar1$vars
tmleCom_wmT.bA.bY.1a_sglm$EY_gstar1$CIs

# Examples of estimates under f_gstar0 = 0:
tmleCom_wmT.bA.bY.1a_sglm$EY_gstar2$estimates
tmleCom_wmT.bA.bY.1a_sglm$EY_gstar2$vars
tmleCom_wmT.bA.bY.1a_sglm$EY_gstar2$CIs

# Examples of estimates for ATE under f_gstar1 - f_gstar0:
tmleCom_wmT.bA.bY.1a_sglm$ATE$estimates
tmleCom_wmT.bA.bY.1a_sglm$ATE$vars
tmleCom_wmT.bA.bY.1a_sglm$ATE$CIs
head(tmleCom_wmT.bA.bY.1a_sglm$ATE$IC)

# Community-level analysis with a pooled individual-level regression on outcome
tmleCom_wmT.bA.bY.1b_sglm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = TRUE, 
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleCom_wmT.bA.bY.1b_sglm$ATE$estimates

# Individual-level analysis with both individual-level outcome and treatment mechanisms
tmleCom_wmT.bA.bY.2_sglm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "individual_level", communityID = "id", 
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleCom_wmT.bA.bY.2_sglm$ATE$estimates

# Failing to provide communityID will automatically set community.step to "NoCommunity"
tmleCom_wmT.bA.bY.NoC_sglm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "individual_level", communityID = NULL, 
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleCom_wmT.bA.bY.NoC_sglm$ATE$estimates

# Stratification analysis that run separate outcome (exposure) mechanism for each community
# use glm since only around 50 observations per community, speed.glm easily fails
# takes longer time than the tests above since doing 1000 TMLEs (one TMLE per community)
# so set verbose to TRUE to track running progress
tmleCom_Options(Qestimator = "glm__glm", gestimator = "glm__glm", maxNperBin = N)
tmleCom_wmT.bA.bY.str_sglm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "perCommunity", communityID = "id", verbose = TRUE,
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleCom_wmT.bA.bY.str_sglm$ATE$estimates

#***************************************************************************************
# 1.2 Same as above but for different Qestimator and gestimator through tmleCom_Options()
# via community-level analysis with a pooled individual-level regression on outcome.
# (See more details in examples in tmleCom_Options())
#***************************************************************************************
# SuperLearner for both outcome and treatment (clever covariate) regressions
# using all parent nodes (of Y and A) as regressors (respectively)
require("SuperLearner")
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "SuperLearner", 
                maxNperBin = N, SL.library = c("SL.glm", "SL.step", "SL.bayesglm"))
tmleCom_wmT.bA.bY.2_SL <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = TRUE, 
                Qform = NULL, hform.g0 = NULL, hform.gstar = NULL)
tmleCom_wmT.bA.bY.2_SL$ATE$estimates

# SuperLearner for outcome regressions and glm for treatment regressions
# using all regressors in the correctly specified Qform and 
# all regressors in the misspecified hform.g0 and hform.gstar
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "glm__glm", 
                maxNperBin = N, SL.library = c("SL.mean", "SL.stepAIC", "SL.bayesglm"))
tmleCom_wmT.bA.bY.2_SL.glm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = TRUE,
                Qform = NULL, hform.g0 = "A ~ W1", hform.gstar = "A ~ E1 + W2")
tmleCom_wmT.bA.bY.2_SL.glm$ATE$estimates
  
# Speedglm for outcome regressions and sl3 for treatment regressions
# using all regressors in the correctly specified Qform and 
# all regressors in the misspecified hform.g0 and hform.gstar  
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "sl3_pipelines", maxNperBin = N, 
                sl3_learners = list(glm_fast = sl3::make_learner(sl3::Lrnr_glm_fast), 
                                    xgb = sl3::make_learner(sl3::Lrnr_xgboost),), 
                sl3_metalearner = sl3::make_learner(
                  sl3::Lrnr_optim, loss_function = sl3::loss_loglik_binomial,
                  learner_function = sl3::metalearner_logistic_binomial))
tmleCom_wmT.bA.bY.2_speedglm.sl3 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = TRUE,
                Qform = NULL, hform.g0 = "A ~ W1", hform.gstar = "A ~ E1 + W2")
tmleCom_wmT.bA.bY.2_speedglm.sl3$ATE$estimates  

#***************************************************************************************
# 1.3 Evaluating mean population outcome under static intervention A = 0
# with different community-level and individual-level weight choices 
#***************************************************************************************
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "speedglm__glm", maxNperBin = N)
# weigh individuals in data equally & weigh community by its number of individuals
tmleCom_wmT.bA.bY.1a_w1 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, 
                obs.wts = "equal.within.pop", community.wts = "size.community", 
                community.step = "community_level", communityID = "id")
tmleCom_wmT.bA.bY.1a_w1$EY_gstar1$estimates

# same as above but weigh individuals within the same community equally
tmleCom_wmT.bA.bY.1a_w2 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, 
                obs.wts = "equal.within.community", community.wts = "size.community", 
                community.step = "community_level", communityID = "id")
tmleCom_wmT.bA.bY.1a_w2$EY_gstar1$estimates

# weigh individuals within the same community equally & weigh community equally
tmleCom_wmT.bA.bY.1a_w3 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, 
                obs.wts = "equal.within.community", community.wts = "equal.community", 
                community.step = "community_level", communityID = "id")
tmleCom_wmT.bA.bY.1a_w3$EY_gstar1$estimates

#***************************************************************************************
# 1.4 Specifying user-supplied stochastic or deterministic intervention function
#***************************************************************************************
# Intervention function that will sample A with probability P(A=1) = prob.val
define_f.gstar <- function(prob.val, rndseed = NULL) {
  eval(prob.val) 
  f.gstar <- function(data, ...) {
    print(paste0("probability of selection: ", prob.val))
    rbinom(n = NROW(data), size = 1, prob = prob.val)
  }
  return(f.gstar)
}
# Stochastically set 50% of the population to A=1 
f.gstar_stoch.0.5 <- define_f.gstar(prob.val = 0.5)
# Deterministically set 100% of the population to A=1 
f.gstar_determ.1 <- define_f.gstar(prob.val = 1)
# Deterministically set 100% of the population to A=0
f.gstar_determ.0 <- define_f.gstar(prob.val = 0)

#***************************************************************************************
# 1.5 Equivalent ways of specifying user-supplied (static) intervention (f_gstar1 = 1)
#***************************************************************************************
# Alternative 1: via intervention functoin that sets every invidual's A to constant 1
tmleCom_wmT.bA.bY.1a_fgtar1 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = f.gstar_determ.1, 
                community.step = "community_level", communityID = "id")
tmleCom_wmT.bA.bY.1a_fgtar1$EY_gstar1$estimates

# Alternative 2: by simply setting f_gstar1 to 1
tmleCom_wmT.bA.bY.1a_fgtar2 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L,
                community.step = "community_level", communityID = "id")
tmleCom_wmT.bA.bY.1a_fgtar2$EY_gstar1$estimates

# Alternative 3: by setting f_gstar1 to a vector of 1's of length NROW(data)
tmleCom_wmT.bA.bY.1a_fgtar3 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), 
                f_gstar1 = rep(1L, NROW(comSample.wmT.bA.bY)), 
                community.step = "community_level", communityID = "id")
tmleCom_wmT.bA.bY.1a_fgtar1$EY_gstar1$estimates

#***************************************************************************************
# 1.6 Running exactly the same estimator as 1.1 but using h_gstar/h_gN as a coviariate 
# in the targeting step (default to use weighted intercept-based TMLE)
#***************************************************************************************
# unweighted covariate-based TMLE
tmleCom_wmT.bA.bY.1a_covTMlE <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = FALSE, 
                TMLE.targetStep = "tmle.covariate",  # default as "tmle.intercept"
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleCom_wmT.bA.bY.1a_covTMlE$ATE$estimates

#***************************************************************************************
# 1.7 Equivalent ways of specifying the regression formulae 
# (if Ynode is specified as "Y" and WEnodes = c("E1", "E2", "W1", "W2", "W3"))
#***************************************************************************************
# For outcome regression, the left side of Qform will be ignored if Ynode is specified,
# with dependent variable being set to Ynode.
Qform1 <- "Y ~ E1 + E2 + W2 + W3 + A" 
Qform2 <- "AnythingIsFine ~ E1 + E2 + W2 + W3 + A" 
Qform3 <- NULL  # since all parent nodes of Y will be used as regressors

# For treatment regressions, if hform.gstar unspecified, it uses the same regression
# formula as hform.g0 does. 
# Alternative 1: specify hform.g0 and hform.gstar respectively
hform.g0 <- "A ~ E1 + E2 + W1"
hform.gstar <- "A ~ E1 + E2 + W1"

# Alternative 2: specify hform.g0 only
hform.g0 <- "A ~ E1 + E2 + W1"
hform.gstar <- NULL

#***************************************************************************************
# 1.8 Equivalent ways of allowing printing status message
#***************************************************************************************
# Controlling the global setting 
options(tmleCommunity.verbose = TRUE)
tmleCom_wmT.bA.bY.print1 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id")

# Alternative: using the verbose argument in tmleCommunity()
tmleCom_wmT.bA.bY.print2 <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", verbose = TRUE)

#***************************************************************************************
# Example 2: Non-hierarchical example, with one continuous A and continuous Y 
# True mean population outcome under stochastic intervention (specified below)
# is approximately 3.50856
data(indSample.iid.cA.cY_list)  # load the sample data 
indSample.iid.cA.cY <- indSample.iid.cA.cY_list$indSample.iid.cA.cY
true.shift <- indSample.iid.cA.cY_list$shift.val  # 2
true.truncBD <- indSample.iid.cA.cY_list$truncBD  # 10
N <- NROW(indSample.iid.cA.cY)
Qform.corr <- "Y ~ W1 + W2 + W3 + W4 + A" # correct Q form
gform.corr <- "A ~ W1 + W2 + W3 + W4"  # correct g
#***************************************************************************************

#***************************************************************************************
# 2.1 Specifying stochastic intervention function that could represent the true 
# shifted version of the current treatment mechanism
#***************************************************************************************
define_f.gstar <- function(shift.val, truncBD, rndseed = NULL) {
  shift.const <- shift.val
  trunc.const <- truncBD
  f.gstar <- function(data, ...) {
    print(paste0("shift.const: ", shift.const))
    set.seed(rndseed)
    A.mu <- 0.86 * data[,"W1"] + 0.41 * data[,"W2"] - 0.34 * data[,"W3"] + 0.93 * data[,"W4"]
    untrunc.A <- rnorm(n = nrow(data), mean = A.mu + shift.const, sd = 1)
    r.new.A <- exp(0.8 * shift.const * (untrunc.A - A.mu - shift.const / 3))
    trunc.A <- ifelse(r.new.A > trunc.const, untrunc.A - shift.const, untrunc.A)
    return(trunc.A)
  }
  return(f.gstar)
}
# correctly specified stochastic intervention with true shift value and truncated bound
f.gstar.corr <- define_f.gstar(shift = true.shift, truncBD = true.truncBD)
# Misspecified specified stochastic intervention 
f.gstar.mis <- define_f.gstar(shift = 5, truncBD = 8)

#***************************************************************************************
# 2.2 Estimating mean population outcome under different stochastic interventions
# speed.glm using correctly specified Qform, hform.g0 and hform.gstar
#***************************************************************************************
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "speedglm__glm", maxNperBin = N)
# correctly specified stochastic intervention 
tmleind_iid.cA.cY_true.fgstar <- 
  tmleCommunity(data = indSample.iid.cA.cY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar.corr,
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleind_iid.cA.cY_true.fgstar$EY_gstar1$estimates

# misspecified stochastic intervention
tmleind_iid.cA.cY_mis.fgstar <- 
  tmleCommunity(data = indSample.iid.cA.cY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar.mis,
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleind_iid.cA.cY_mis.fgstar$EY_gstar1$estimates

#***************************************************************************************
# 2.3 Same as above but using larger number of Monte-Carlo simulations
# using all parent nodes (of Y and A) as regressors (respectively)
#***************************************************************************************
# A will be sampled 10 times (for a total sample size of NROW(data)*10 under f_gstar1)
tmleind_iid.cA.cY_10MC <- 
  tmleCommunity(data = indSample.iid.cA.cY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar.corr, n_MCsims = 10)
tmleind_iid.cA.cY_10MC$EY_gstar1$estimates

#***************************************************************************************
# 2.4 Running exactly the same estimator as 2.1 but defining different values of bin cutoffs 
#***************************************************************************************
# using equal-length method with 10 bins 
tmleCom_Options(bin.method = "equal.len", nbins = 10, maxNperBin = N)
tmleind_iid.cA.cY_len <- 
  tmleCommunity(data = indSample.iid.cA.cY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar.corr,
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleind_iid.cA.cY_len$EY_gstar1$estimates

# using combination of equal-length and equal-mass method with 20 bins 
tmleCom_Options(bin.method = "dhist", nbins = 20, maxNperBin = N)
tmleind_iid.cA.cY_dhist <- 
  tmleCommunity(data = indSample.iid.cA.cY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = f.gstar.corr,
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleind_iid.cA.cY_dhist$EY_gstar1$estimates

#***************************************************************************************
# 2.5 Estimating the additive treatment effect (ATE) for two stochastic interventions
#***************************************************************************************
# Intervention function that will shift A by constant rate (shift.rate)
# (A special case of stochastic intervention with constant shift)
define_f.gstar <- function(shift.rate) {
  eval(shift.rate) 
  f.gstar <- function(data, ...) {
    print(paste0("rate of shift: ", shift.rate))
    data[, "A"] * shift.rate
  }
  return(f.gstar)
}
f.gstar_shift0.8 <- define_f.gstar(shift.rate = 0.8)
f.gstar_shift0.5 <- define_f.gstar(shift.rate = 0.6)

tmleCom_Options(bin.method = "equal.mass", nbins = 5, maxNperBin = N)
tmleind_iid.cA.cY_ATE <- 
  tmleCommunity(data = indSample.iid.cA.cY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("W1", "W2", "W3", "W4"),
                f_gstar1 = f.gstar_shift0.8, f_gstar2 = f.gstar_shift0.5,
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)

# ATE estimates for f_gstar1-f_gstar2:
tmleind_iid.cA.cY_ATE$ATE$estimates
tmleind_iid.cA.cY_ATE$ATE$vars
tmleind_iid.cA.cY_ATE$ATE$CIs   

#***************************************************************************************
# Example 3: Non-Hierarchical example, with one binary A and one rare bianry Y 
# (Independent case-control)  True ATE is approximately 0.012662
data(indSample.iid.bA.bY.rareJ1_list)
indSample.iid.bA.bY.rareJ1 <- indSample.iid.bA.bY.rareJ1_list$indSample.iid.bA.bY.rareJ1
obs.wt.J1 <- indSample.iid.bA.bY.rareJ1_list$obs.wt.J1
Qform.corr <- "Y ~ W1 + W2*A + W3 + W4" # correct Q form
gform.corr <- "A ~ W1 + W2 + W3 + W4"  # correct g
tmleCom_Options(maxNperBin = NROW(indSample.iid.bA.bY.rareJ1))
#***************************************************************************************

#***************************************************************************************
# 3.1 Estimating ATE for f_gstar1 = 1 vs f_gstar2 = 0
# using correct observation weights and correctly specified Qform & gform 
#***************************************************************************************
tmleind_iid.bA.bY_corrWT <- 
  tmleCommunity(data = indSample.iid.bA.bY.rareJ1, Ynode = "Y", Anodes = "A",
                WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = 1, f_gstar2 = 0,
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr,
                obs.wts = obs.wt.J1, verbose = TRUE)
tmleind_iid.bA.bY_corrWT$ATE$estimates["tmle", ]  # 0.01220298, good estimate

#***************************************************************************************
# 3.2 Same as above but not specifying the observation weights
# obs.wts = NULL is equivalent to obs.wts = "equal.within.pop"
#***************************************************************************************
tmleind_iid.bA.bY_misWT <- 
  tmleCommunity(data = indSample.iid.bA.bY.rareJ1, Ynode = "Y", Anodes = "A",
                WEnodes = c("W1", "W2", "W3", "W4"), f_gstar1 = 1, f_gstar2 = 0,
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr,
                obs.wts = NULL, verbose = TRUE)
tmleind_iid.bA.bY_misWT$ATE$estimates["tmle", ]  # 0.2466575, bad estimate

## End(Not run)

chizhangucb/tmleCommunity documentation built on May 20, 2019, 3:34 p.m.

chizhangucb/tmleCommunity index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

chizhangucb/tmleCommunity
Targeted Maximum Likelihood Estimation for Hierarchical Data

tmleCommunity: Estimate Marginal Treatment Effects For Arbitrary...
In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data

Description

Usage

Arguments

Details

Value

IPTW estimator

TMLE estimator

GCOMP estimator

Modeling P(A | W, E) for covariates (A, W, E)

Three methods for choosing bin (interval) locations for a univariate and continuous variable A

See Also

Examples

Related to tmleCommunity in chizhangucb/tmleCommunity...

R Package Documentation

Browse R Packages

We want your feedback!

chizhangucb/tmleCommunity Targeted Maximum Likelihood Estimation for Hierarchical Data

tmleCommunity: Estimate Marginal Treatment Effects For Arbitrary... In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data

Description

Usage

Arguments

Details

Value

IPTW estimator

TMLE estimator

GCOMP estimator

Modeling P(A | W, E) for covariates (A, W, E)

Three methods for choosing bin (interval) locations for a univariate and continuous variable A

See Also

Examples

Related to tmleCommunity in chizhangucb/tmleCommunity...

R Package Documentation

Browse R Packages

We want your feedback!

chizhangucb/tmleCommunity
Targeted Maximum Likelihood Estimation for Hierarchical Data

tmleCommunity: Estimate Marginal Treatment Effects For Arbitrary...
In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data