# tmleCommunity-package: Targeted Maximum Likelihood Estimation for Community-level... In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data

## Description

Targeted Maximum Likelihood Estimation (TMLE) of the average causal effect of community-based intervention(s) at a single time point on an individual-based outcome of interest. (and can be extended to additive treatment effect). In other words, it estimates the marginal treatment effect of single-time point arbitrary interventions on a continuous or binary outcome in community-independent data, adjusting for both community-level and individual-level baseline covariates. The package also provides Inverse-Probability-of-Treatment-Weighted estimator (IPTW) and parametric G-computation formula estimator (GCOMP). The statistical inference (Standard errors, t statistc, p-value and confidence intervals) of both TMLE and IPTW are based on the corresponding influence curve, respectively. Optional data-adaptive estimation of exposure and outcome mechanisms using the SuperLearner package and h2o package (latter for a large dataset) is strongly recommended, especially when the outcome mechanism and treatment mechnism are unknown. Besides, it allows for panel data transformation, such as with random effects and fixed effects.

## Details

The input dataset should be made up of rows of community-specific and individual-specific observations, for community j, each row i includes random variables (W_{i,j}, E_{j}, A_{j}, Y_{i,j}), where E_j represents a vector of community j's community-level (environmental) baseline covariates (individuals within the same community share the same values of E_j), W_{i,j} represents a vector of individual i's individual-level baseline covariates, A_j is the exposure(s) (can be univariate or multivariate, can be binary, categorical or continuous) assigned or naturally occurred in community j (individuals within the same community receive the same value of A_j) and Y_{i,j} is i's outcome (either binary or continuous). Each individual's baseline covariates (W_{i,j} depends on the environmental baseline covariates E_j of the community j to which i belongs to. Similarly, each community's exposure A_j depends on its community-level baseline covariates E_j and individual-level baseline covariates of all individuals belonging to community j (all W_{i,j} such that i belongs to j). Besides, each outcome Y_{i,j} could be affected by its baseline community and individual-level covariates (E_j, W_{i,j}) and the baseline covariates of other individuals within the same community (W_{s,j}: s\neq i, s\in j), together with its community-based intervention A_j. We note that the input data with no hierarchical structure (i.e., no communities and only individuals) is a special case of the hierarchical data since it simply treats E_j as NULL.

There are currently three approaches that can be used in hierarchical data analysis. The first community-level TMLE is developed under a non-parametric causal model that allows for arbitrary interactions between individuals within a community. It estimates the community-level causal effect by aggregating data at a community-level and treating community rather than the individual as the unit of analysis (i.e., both community-level outcome and treatment mechanisms). The second individual-level TMLE is developed under the submodel of the causal model in the first approach, incoporating knowledge of the dependence structure between individual within communities (i.e., both individual-level outcome and treatmnet mechanisms). The third stratified TMLE fits a separate outcome (exposure) mechanism for each community, and then combine those estimates into a (user-specific) average (Default to be community size-weighed). Note that the stratified TMLE naturally controls for the community-level observed covariates and unobserved factors. Namely, there is no E in the regressors for both outcome and treatment mechanisms.

## References

1. Balzer L. B., Zheng W., van der Laan M. J., Petersen M. L. and the SEARCH Collaboration (2017). A New Approach to Hierarchical Data Analysis: Targeted Maximum Likelihood Estimation of Cluster-Based Effects Under Interference. ArXiv e-prints. 1706.02675.

2. Mu\~noz, I. D. and van der Laan, M. (2012). Population Intervention Causal Effects Based on Stochastic Interventions. Biometrics, 68(2):541-549.

3. Sofrygin, O. and van der Laan, M. J. (2015). tmlenet: Targeted Maximum Likelihood Estimation for Network Data. R package version 0.1.9. https://github.com/osofr/tmlenet

4. van der Laan, M. (2014). Causal Inference for a Population of Causally Connected Units. Journal of Causal Inference, 2(1)

5. van der Laan, Mark J. and Gruber, Susan (2011). "Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome". U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 290. http://biostats.bepress.com/ucbbiostat/paper290

6. van der Laan, Mark J. and Rose, Sherri, "Targeted Learning: Causal Inference for Observational and Experimental Data" New York: Springer, 2011.

## Datasets

To learn more about the type of data input required by tmleCommunity, see the following example datasets:

• comSample.wmT.bA.bY_list

• indSample.iid.cA.cY_list

• indSample.iid.bA.bY.rareJ1_list

• indSample.iid.bA.bY.rareJ2_list

For R code that can simulate more data with different structures, please check