#' Targeted Maximum Likelihood Estimation for Community-level Data
#'
#' Targeted Maximum Likelihood Estimation (TMLE) of the average causal effect of community-based intervention(s) at a single time point
#' on an individual-based outcome of interest. (and can be extended to additive treatment effect). In other words, it estimates the marginal treatment
#' effect of single-time point arbitrary interventions on a continuous or binary outcome in community-independent data, adjusting for
#' both community-level and individual-level baseline covariates. The package also provides Inverse-Probability-of-Treatment-Weighted
#' estimator (IPTW) and parametric G-computation formula estimator (GCOMP). The statistical inference (Standard errors, t statistc,
#' p-value and confidence intervals) of both TMLE and IPTW are based on the corresponding influence curve, respectively. Optional
#' data-adaptive estimation of exposure and outcome mechanisms using the SuperLearner package, the sl3 package (a modern implementation of
#' the Super Learner algorithm) and the h2o package (for a large dataset) is strongly recommended, especially when the outcome mechanism
#' and treatment mechnism are unknown. Besides, it allows for panel data transformation, such as with random effects and fixed effects.
#'
#' The input dataset should be made up of rows of community-specific and individual-specific observations, for community \eqn{j}, each
#' row \eqn{i} includes random variables \eqn{(W_{i,j}, E_{j}, A_{j}, Y_{i,j})}, where \eqn{E_j} represents a vector of community
#' \eqn{j}'s community-level (environmental) baseline covariates (individuals within the same community share the same values of
#' \eqn{E_j}), \eqn{W_{i,j}} represents a vector of individual \eqn{i}'s individual-level baseline covariates, \eqn{A_j} is the
#' exposure(s) (can be univariate or multivariate, can be binary, categorical or continuous) assigned or naturally occurred in
#' community \eqn{j} (individuals within the same community receive the same value of \eqn{A_j}) and \eqn{Y_{i,j}} is \eqn{i}'s
#' outcome (either binary or continuous). Each individual's baseline covariates \eqn{(W_{i,j}} depends on the environmental
#' baseline covariates \eqn{E_j} of the community \eqn{j} to which \eqn{i} belongs to. Similarly, each community's exposure
#' \eqn{A_j} depends on its community-level baseline covariates \eqn{E_j} and individual-level baseline covariates of all
#' individuals belonging to community \eqn{j} (all \eqn{W_{i,j}} such that \eqn{i} belongs to \eqn{j}). Besides, each outcome
#' \eqn{Y_{i,j}} could be affected by its baseline community and individual-level covariates \eqn{(E_j, W_{i,j})} and the baseline
#' covariates of other individuals within the same community \eqn{(W_{s,j}: s\neq i, s\in j)}, together with its community-based
#' intervention \eqn{A_j}. We note that the input data with no hierarchical structure (i.e., no communities and only individuals)
#' is a special case of the hierarchical data since it simply treats \eqn{E_j} as \code{NULL}.
#'
#' There are currently three approaches that can be used in hierarchical data analysis. The first community-level TMLE is developed
#' under a non-parametric causal model that allows for arbitrary interactions between individuals within a community. It estimates
#' the community-level causal effect by aggregating data at a community-level and treating community rather than the individual as
#' the unit of analysis (i.e., both community-level outcome and treatment mechanisms). The second individual-level TMLE is developed
#' under the submodel of the causal model in the first approach, incoporating knowledge of the dependence structure between
#' individual within communities (i.e., both individual-level outcome and treatmnet mechanisms). The third stratified TMLE fits a
#' separate outcome (exposure) mechanism for each community, and then combine those estimates into a (user-specific) average
#' (Default to be community size-weighed). Note that the stratified TMLE naturally controls for the community-level observed
#' covariates and unobserved factors. Namely, there is no \eqn{E} in the regressors for both outcome and treatment mechanisms.
#'
# @section Documentation:
# \itemize{
# \item To see the package vignette use: \code{vignette("tmleCommunity_vignette", package="tmleCommunity")}
# \item To see all available package documentation use: \code{help(package = 'tmleCommunity')}
# }
#'
#' @section References:
#' \enumerate{
#' \item Balzer L. B., Zheng W., van der Laan M. J., Petersen M. L. and the SEARCH Collaboration (2017). A New Approach to
#' Hierarchical Data Analysis: Targeted Maximum Likelihood Estimation of Cluster-Based Effects Under Interference.
#' ArXiv e-prints. 1706.02675.
#' \item Mu\eqn{\~n}oz, I. D. and van der Laan, M. (2012). Population Intervention Causal Effects Based on Stochastic Interventions.
#' Biometrics, 68(2):541-549.
#' \item Sofrygin, O. and van der Laan, M. J. (2015). tmlenet: Targeted Maximum Likelihood Estimation for Network Data.
#' R package version 0.1.9. https://github.com/osofr/tmlenet
#' \item van der Laan, M. (2014). Causal Inference for a Population of Causally Connected Units. Journal of Causal Inference, 2(1)
#' \item van der Laan, Mark J. and Gruber, Susan (2011). "Targeted Minimum Loss Based Estimation of an Intervention Specific
#' Mean Outcome". U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 290.
#' http://biostats.bepress.com/ucbbiostat/paper290
#' \item van der Laan, Mark J. and Rose, Sherri, "Targeted Learning: Causal Inference for Observational and
#' Experimental Data" New York: Springer, 2011.
#' }
#'
# @section Routines:
# The following routines will be generally invoked, in the same order as presented below.
# \describe{
#
# Finishing...
# }
#'
#' @section Datasets:
#'
#' To learn more about the type of data input required by \code{\link{tmleCommunity}}, see the following example datasets:
#' \itemize{
#' \item \code{\link{comSample.wmT.bA.bY_list}}
#' \item \code{\link{indSample.iid.cA.cY_list}}
#' \item \code{\link{indSample.iid.bA.bY.rareJ1_list}}
#' \item \code{\link{indSample.iid.bA.bY.rareJ2_list}}
#' }
#' For R code that can simulate more data with different structures, please check
#'
#' \url{https://github.com/chizhangucb/tmleCommunity/tree/master/tests/dataGeneration}
#'
#' @section Updates:
#' Check for updates and report bugs at \url{https://github.com/chizhangucb/tmleCommunity}.
#'
#' @docType package
#' @name tmleCommunity-package
#'
NULL
#' An Example of a Hierarchical Data Containing a Cluster-Based Binary Exposure with a Individual-Level Binary Outcome.
#'
#' Simulated hierarchical dataset containing 1000 independent communities, each (community \eqn{j}) containing \eqn{n_j} (non-fixed)
#' number of individuals where \eqn{n_j} is drawn from a normal with mean 50 and standard deviation 10 and round to the nearest
#' integer. Each row (observation) includes 2 measured community-level baseline covariates (\code{E1, E2}), 3 dependent
#' individual-level baseline covariates (\code{W1, W2, W3}), 1 dependent bianry exposure (\code{A}) and 1 dependent binary outcoem
#' (\code{Y}), along with one unique community identifier (\code{id}). The community-level baseline covariates (\code{E1, E2})
#' were sampled as i.i.d across all communities, while the individual-level baseline covariates (\code{W1, W2, W3}) for each
#' individual \eqn{i} within communty \eqn{j} was generated conditionally on the values of \eqn{j}'s community-level baseline
#' covariates (\code{E1[j], E2[j]}). Then the community-level exposure (\code{A}) for each community \eqn{j} was sampled
#' conditionally on the value of \eqn{j}'s community-level baseline covariates (\code{E1[j], E2[j]}), together with all
#' invididuals' baseline covariates (\code{W1[i], W2[i], W3[i]}) within community \eqn{j} where \eqn{i=1,..,n_j}. Similary,
#' the individual-level binary outcome \code{Y} for each individual \eqn{i} within communty \eqn{j} was sampled conditionally
#' covariates and exposure (\code{E1[j], E2[j], A[j]}), as well as the value of individual \eqn{i}'s baseline covariates
#' on the value of community \eqn{j}'s baseline (\code{W1[i]}, \code{W2[i]}, \code{W3[i]}). The following section provides more
#' details regarding individual variables in simulated data.
#'
#' @usage data(comSample.wmT.bA.bY_list)
#'
#' @format A data frame with 1000 independent communities, each containing around 50 individuals (in total 50,457 observations
#' (rows)), and 8 variables (columns):
#' \describe{
#' \item{id}{integer (unique) community identifier from 1 to 1000, identical within the same community}
#' \item{E1}{continuous uniform community-level baseline covariate with \code{min=0} and \code{max=1} (independent and identical
#' across all individuals in the same community)}
#' \item{E2}{discrete uniform community-level baseline covariate with 5 elements (0, 0.2, 0.4, 0.8, 1) (independent and identical
#' across all individuals in the same community)}
#' \item{W1}{binary individual-level baseline covariate that depends on the values of community-level baseline covaries (\code{E1,E2})}
#' \item{W2}{continuous individual-level baseline covariate, together with \code{W3}, are drawn from a bivariate normal distribution
#' with correlation 0.6, depending on the values of community's baseline covaries (\code{E1, E2})}
#' \item{W3}{continuous normal individual-level baseline covariate, correlated with \code{W2}, see details in above}
#' \item{A}{binary exposure that depends on community's baseline covariate values in \code{(E1, E2)}, and the mean of all individuals'
#' baseline covariates \code{W1} within the same community}
#' \item{Y}{binary outcome that depends on community's baseline covariate and exposure values in (\code{E1}, \code{E2}, \code{A}),
#' and all individuals' baseline covariate values in \code{(W2, W3)}}
#' }
#' @docType data
#' @keywords datasets
#' @name comSample.wmT.bA.bY_list
#' @source \url{https://github.com/chizhangucb/tmleCommunity/blob/master/tests/dataGeneration/get.cluster.dat.Abin.R}
#'
#' @examples
#' data(comSample.wmT.bA.bY_list)
#' comSample.wmT.bA.bY <- comSample.wmT.bA.bY_list$comSample.wmT.bA.bY
#' head(comSample.wmT.bA.bY)
#' comSample.wmT.bA.bY_list$psi0.Y # 0.103716, True ATE
#' # summarize the number of individuals within each community
#' head(table(comSample.wmT.bA.bY$id))
NULL
#' An Example of a Non-Hierarchical Data Containing a Continuous Exposure with a Continuous Outcome.
#'
#' Simulated (non-hierarchical) dataset containing 10,000 i.i.d. observations, with each row \code{i} consisting of measured baseline
#' covariates (\code{W1}, \code{W2}, \code{W3} and \code{W4}), continuous exposure (\code{A}) and continous outcome (\code{Y}).
#' The baseline covariates \code{W1}, \code{W2}, \code{W3} and \code{W4} were sampled as i.i.d., while the value of exposure \code{A}
#' for each observation \code{i} was drawn conditionally on the value of \code{i}'s four baseline covariates. Besides, the continuous
#' outcome \code{Y} for each observation depends on \code{i}'s baseline covariates and exposure values in (\code{W1[i]},\code{W2[i]},
#' \code{W3[i]}, \code{W4[i]}, \code{A[i]}). The following section provides more details regarding individual variables in simulated
#' data.
#'
#' @usage data(indSample.iid.cA.cY_list)
#'
#' @format A data frame with 10,000 independent observations (rows) and 6 variables:
#' \describe{
#' \item{W1}{binary baseline covariate with \eqn{P(W1=1) = 0.5}}
#' \item{W2}{binary baseline covariate with \eqn{P(W2=1) = 0.3}}
#' \item{W3}{continuous normal baseline covariate with \eqn{\mu} = 0 and \eqn{\sigma} = 0.25}
#' \item{W4}{continuous uniform baseline covariate with \code{min=0} and \code{max=1}}
#' \item{A}{continuous normal exposure where its mean depends on individual's baseline covariate values in \code{(W1, W2, W3, W4)}}
# \code{W2}, \code{W3}, \code{W4}}
#' \item{Y}{continuous normal outcome where its mean depends on individual's baseline covariate and exposure values in (\code{W1},
#' \code{W2}, \code{W3}, \code{W4}, \code{A})}
#' }
#' @docType data
#' @keywords datasets
#' @name indSample.iid.cA.cY_list
#' @source \url{https://github.com/chizhangucb/tmleCommunity/blob/master/tests/dataGeneration/get.iid.dat.Acont.R}
#'
#' @examples
#' data(indSample.iid.cA.cY_list)
#' indSample.iid.cA.cY <- indSample.iid.cA.cY_list$indSample.iid.cA.cY
#' # True mean of outcome under intervention g0
#' psi0.Y <- indSample.iid.cA.cY_list$psi0.Y
#' # True mean of outcoem under stochastic intervention gstar
#' psi0.Ygstar <- indSample.iid.cA.cY_list$psi0.Ygstar
#' # truncated bound used in sampling A* under gstar (in data generating mechanism)
#' indSample.iid.cA.cY_list$truncBD
#' # shift value used in sampling A* under gstar
#' indSample.iid.cA.cY_list$shift.val
NULL
#' An Example of a Non-Hierarchical Data Containing a Binary Exposure with a Rare Outcome (Independent Case-Control J = 1)
#'
#' Simulated (non-hierarchical) dataset containing 2,000 i.i.d. observations, with each row \code{i} consisting of 4 measured baseline
#' covariates (\code{W1}, \code{W2}, \code{W3} and \code{W4}), 1 binary exposure (\code{A}) and 1 binary outcome (\code{Y}) that
#' defines case or control status. The baseline covariates \code{W1}, \code{W2}, \code{W3} and \code{W4} were sampled as i.i.d.,
#' while the exposure \code{A} for each observation \code{i} depends on \code{i}'s four baseline covariates. Similarly, the outcome
#' \code{Y} for each observation depends on \code{i}'s baseline covariates and exposure values. Moreover, we can also describe the
#' case-control design as first sampling \eqn{1} case \eqn{(W_1^1, W_2^1, W_3^1, W_4^1, A^1)} from the conditional distribution of
#' \eqn{(W_1, W_2, W_3, W_4, A)}, given Y = 1. One then samples \eqn{J} controls \eqn{(W_1^{0,j}, W_2^{0,j}, W_3^{0,j},
#' W_4^{0,j}, A^{0,j})} from \eqn{(W_1, W_2, W_3, W_4, A)}, given Y = 0, \eqn{j=1,...,J}. Thus, the cluster containing one case
#' and \code{J} controls is considered the experimental unit. Finally one gets \eqn{nC} cases and \eqn{nCo} controls with
#' \eqn{J=nC/nCo}, where \eqn{J} can be used effectively in observation weights. The following section provides more details
#' regarding individual variables in simulated data.
#'
#' @usage data(indSample.iid.bA.bY.rareJ1_list)
#'
#' @format A data frame with 2,000 independent observations (rows), containing 1000 cases and 1000 controls, and 6 variables:
#' \describe{
#' \item{W1}{continuous uniform baseline covariate with \code{min=0} and \code{max=1}}
#' \item{W2}{continuous normal baseline covariate with \eqn{\mu} = 0 and \eqn{\sigma} = 0.3}
#' \item{W3}{binary baseline covariate with \eqn{P(W2=1) = 0.5}}
#' \item{W4}{binary baseline covariate with \eqn{P(W2=1) = 0.5}}
#' \item{A}{binary exposure that depends on baseline covariate values in \code{(W1, W2, W3, W4)}}
#' \item{Y}{binary outcome that depends on baseline covariate and exposure values in (\code{W1, W2, W3, W4, A})}
#'
#' }
#' @docType data
#' @keywords datasets
#' @name indSample.iid.bA.bY.rareJ1_list
#' @source \url{https://github.com/chizhangucb/tmleCommunity/blob/master/tests/dataGeneration/get.iid.dat.Acont.R}
#'
#' @examples
#' data(indSample.iid.bA.bY.rareJ1_list)
#' indSample.iid.bA.bY.rareJ1 <- indSample.iid.bA.bY.rareJ1_list$indSample.iid.bA.bY.rareJ1
#' head(indSample.iid.bA.bY.rareJ1_list$obs.wt.J1) # Assigned weights to each observations
#' indSample.iid.bA.bY.rareJ1_list$q0 # 0.013579 True prevalence probability
#' indSample.iid.bA.bY.rareJ1_list$psi0.Y # 0.012662 True ATE
#' indSample.iid.bA.bY.rareJ1_list$J # 1 The ratio of number of controls to cases
NULL
#' An Example of a Non-Hierarchical Data Containing a Binary Exposure with a Rare Outcome (Independent Case-Control J = 2)
#'
#' Simulated (non-hierarchical) dataset containing 3,000 i.i.d. observations. The data structure of \code{indSample.iid.bA.bY.rareJ2}
#' is identical to this of \code{indSample.iid.bA.bY.rareJ1}, except that now the ratio of the number of controls to the number of
#' case \eqn{J} is 2.
#'
#' @usage data(indSample.iid.bA.bY.rareJ2_list)
#'
#' @format A data frame with 3,000 independent observations (rows), containing 1000 cases and 2000 controls, and 6 variables
#' @docType data
#' @keywords datasets
#' @name indSample.iid.bA.bY.rareJ2_list
NULL
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.