The tmlenet R package implements the Targeted Maximum Likelihood Estimation (TMLE) of causal effects under single time point stochastic interventions in network data. The package also implements the Horvitz-Thompson estimator for networks (IPTW) and the parametric g-computation formula estimator. The inference for the TMLE is based on its asymptotic normality and the efficient influence curve for dependent data. The inference for IPTW is based on its corresponding influence curve for dependent data.
The input data structure consists of rows of unit-specific observations, with each row i
represented by random variables
(F.i
,W.i
,A.i
,Y.i
), where F.i
is a vector of "friend IDs" of unit i
(also referred to as i
's "network"), W.i
is a vector of i
's baseline covariates, A.i
is i
's exposure
(either binary, categorical or continuous) and Y.i
is i
's binary outcome.
Each exposure A.i
depends on (possibly multivariate) baseline summary measure(s) sW.i
, where sW.i
can be any user-specified function of i
's baseline covariates W.i
and the baseline covariates of i
's friends in
set F.i
(all W.j
such that j
is in F.i
).
Similarly, each outcome Y.i
depends on sW.i
and (possibly multivariate) summary measure(s) sA.i
,
where sA.i
can be any user-specified function of i
's baseline covariates and exposure (W.i
,A.i
) and the
baseline covariates and exposures of i
's friends (all W.j
,A.j
such that j
is in i
's friend set F.i
).
The summary measures (sW.i
,sA.i
) are defined simultaneously for all i
with functions
def.sW
and def.sA
.
It is assumed that (sW.i
,sA.i
) have the same dimensionality across i
. The function eval.summaries
can be used for evaluating these summary measures. All estimation is performed by calling the tmlenet
function.
The vector of friends F.i
can be specified either as a single column NETIDnode
in the input data (where each F.i
is a string
of friend IDs or friend row numbers delimited by the character separator sep
) or as a separate input matrix NETIDmat
of network IDs
(where each row is a vector of friend IDs or friend row numbers).
Specifying the network as a matrix generally results in significant improvements to run time.
The following routines will be generally invoked, in the same order as presented below.
def.sW
This is the first part of the two part specification of the structural equation model
for the outcome Y
.
Defines the (multivariate) baseline-covariate-based summary measure functions
that will be later applied to the input data to derive the (multivariate) summary measures sW
.
Each component sW[j]
of sW
is defined by an R expression that takes as its input
unit's baseline covariates and the baseline covariates of unit's friends.
Each argument passed to def.sW
is considered a separate summary measure, with the j
th argument
defining the j
the summary measure sW[j]
and the name of the j
th argument defining the name
of the summary measure sW[j]
.
The arguments of def.sW
can be either named, unnamed or a mixture of both. When the argument j
is unnamed,
the summary measure name for sW[j]
is created automatically.
Each summary measure is defined either by an evaluable R expressions or by a string containing an evaluable R
expression.
These expressions can use a special double-square-bracket subsetting operator "Var[[index]]"
, which enables
referencing the variable Var
values of unit's friends.
For example,
Var[[1]]
will evaluate to a one-dimensional vector of summary measures of length nrow(data)
, where for each
row from the input data
,
this summary measure will contain the Var
value of the unit's first friend. The ordering of friends is
determined by the ordering of friend IDs specified in the network input.
In cases when the unit doesn't have any friends, its corresponding value of Var[[1]]
will evaluate
to NA
by default. However, all such NA
's can be replaced by 0's by passing replaceNAw0 = TRUE
as an additional argument to def.sW
.
One can also use vectors for indexing friend variable Var
values in Var[[index]]
.
For example, Var[[1:Kmax]]
will evaluate to a Kmax
-dimensional summary measure, which will be a matrix
with nrow(data)
rows and Kmax
columns,
where the first column will evaluate to Var[[1]]
, the second to Var[[2]]
, and so on,
up to the last column evaluating to Var[[Kmax]]
.
Note that Kmax
is a special reserved constant that can be used inside the network indexing operators.
It is set to the highest number of friends among all units in the input data
and it is specified by
the user input argument Kmax
. See def.sW
manual for various examples of
summary measures that use the network indexing operators.
def.sA
Defines treatment summary measures sA
that can be functions of each unit's exposure & baseline covariates,
as well the exposures and baseline covariates of unit's friends.
This is the second part of the two part specification of the structural equation model for the outcome Y
.
The syntax is identical to def.sW
function, except that def.sA
can consists of functions of baseline covariates
as well as the exposure Anode
.
eval.summaries
A convenience function that can be used for validating and evaluating the user-specified summary measures.
Takes the input dataset and evaluates the summary measures based on objects previously defined with function calls def.sW
and def.sA
.
Note that this function is called automatically by the tmlenet
function and does not need to be called by the user prior to calling tmlenet
.
tmlenet
Performs estimation of the causal effect of interest using the observed input data
,
the intervention of interest, the network information and the previously defined summary measures sW
, sA
.
To learn more about the type of data input required by tmlenet
, see the following example datasets:
df_netKmax2
df_netKmax6
NetInd_mat_Kmax6
Check for updates and report bugs at http://github.com/osofr/tmlenet.
Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.