The tmlenet R package implements the Targeted Maximum Likelihood Estimation (TMLE) of causal effects under single time point stochastic interventions in network data. The package also implements the Horvitz-Thompson estimator for networks (IPTW) and the parametric g-computation formula estimator. The inference for the TMLE is based on its asymptotic normality and the efficient influence curve for dependent data. The inference for IPTW is based on its corresponding influence curve for dependent data.
The input data structure consists of rows of unit-specific observations, with each row
i represented by random variables
F.i is a vector of "friend IDs" of unit
(also referred to as
W.i is a vector of
i's baseline covariates,
(either binary, categorical or continuous) and
i's binary outcome.
A.i depends on (possibly multivariate) baseline summary measure(s)
can be any user-specified function of
i's baseline covariates
W.i and the baseline covariates of
i's friends in
W.j such that
j is in
Similarly, each outcome
Y.i depends on
sW.i and (possibly multivariate) summary measure(s)
sA.i can be any user-specified function of
i's baseline covariates and exposure (
A.i) and the
baseline covariates and exposures of
i's friends (all
A.j such that
j is in
i's friend set
The summary measures (
sA.i) are defined simultaneously for all
i with functions
It is assumed that (
sA.i) have the same dimensionality across
i. The function
can be used for evaluating these summary measures. All estimation is performed by calling the
The vector of friends
F.i can be specified either as a single column
NETIDnode in the input data (where each
F.i is a string
of friend IDs or friend row numbers delimited by the character separator
sep) or as a separate input matrix
NETIDmat of network IDs
(where each row is a vector of friend IDs or friend row numbers).
Specifying the network as a matrix generally results in significant improvements to run time.
The following routines will be generally invoked, in the same order as presented below.
This is the first part of the two part specification of the structural equation model
for the outcome
Defines the (multivariate) baseline-covariate-based summary measure functions
that will be later applied to the input data to derive the (multivariate) summary measures
sW is defined by an R expression that takes as its input
unit's baseline covariates and the baseline covariates of unit's friends.
Each argument passed to
def.sW is considered a separate summary measure, with the
jthe summary measure
sW[j] and the name of the
jth argument defining the name
of the summary measure
The arguments of
def.sW can be either named, unnamed or a mixture of both. When the argument
j is unnamed,
the summary measure name for
sW[j] is created automatically.
Each summary measure is defined either by an evaluable R expressions or by a string containing an evaluable R
These expressions can use a special double-square-bracket subsetting operator
"Var[[index]]", which enables
referencing the variable
Var values of unit's friends.
Var[] will evaluate to a one-dimensional vector of summary measures of length
nrow(data), where for each
row from the input
this summary measure will contain the
Var value of the unit's first friend. The ordering of friends is
determined by the ordering of friend IDs specified in the network input.
In cases when the unit doesn't have any friends, its corresponding value of
Var[] will evaluate
NA by default. However, all such
NA's can be replaced by 0's by passing
replaceNAw0 = TRUE
as an additional argument to
One can also use vectors for indexing friend variable
Var values in
Var[[1:Kmax]] will evaluate to a
Kmax-dimensional summary measure, which will be a matrix
nrow(data) rows and
where the first column will evaluate to
Var[], the second to
Var[], and so on,
up to the last column evaluating to
Kmax is a special reserved constant that can be used inside the network indexing operators.
It is set to the highest number of friends among all units in the input
data and it is specified by
the user input argument
def.sW manual for various examples of
summary measures that use the network indexing operators.
Defines treatment summary measures
sA that can be functions of each unit's exposure & baseline covariates,
as well the exposures and baseline covariates of unit's friends.
This is the second part of the two part specification of the structural equation model for the outcome
The syntax is identical to
def.sW function, except that
def.sA can consists of functions of baseline covariates
as well as the exposure
A convenience function that can be used for validating and evaluating the user-specified summary measures.
Takes the input dataset and evaluates the summary measures based on objects previously defined with function calls
Note that this function is called automatically by the
tmlenet function and does not need to be called by the user prior to calling
Performs estimation of the causal effect of interest using the observed input
the intervention of interest, the network information and the previously defined summary measures
To learn more about the type of data input required by
tmlenet, see the following example datasets:
Check for updates and report bugs at http://github.com/osofr/tmlenet.