Targeted Maximum Likelihood Estimation for Network Data
Description
The tmlenet R package implements the Targeted Maximum Likelihood Estimation (TMLE) of causal effects under single time point stochastic interventions in network data. The package also implements the HorvitzThompson estimator for networks (IPTW) and the parametric gcomputation formula estimator. The inference for the TMLE is based on its asymptotic normality and the efficient influence curve for dependent data. The inference for IPTW is based on its corresponding influence curve for dependent data.
Details
The input data structure consists of rows of unitspecific observations, with each row i
represented by random variables
(F.i
,W.i
,A.i
,Y.i
), where F.i
is a vector of "friend IDs" of unit i
(also referred to as i
's "network"), W.i
is a vector of i
's baseline covariates, A.i
is i
's exposure
(either binary, categorical or continuous) and Y.i
is i
's binary outcome.
Each exposure A.i
depends on (possibly multivariate) baseline summary measure(s) sW.i
, where sW.i
can be any userspecified function of i
's baseline covariates W.i
and the baseline covariates of i
's friends in
set F.i
(all W.j
such that j
is in F.i
).
Similarly, each outcome Y.i
depends on sW.i
and (possibly multivariate) summary measure(s) sA.i
,
where sA.i
can be any userspecified function of i
's baseline covariates and exposure (W.i
,A.i
) and the
baseline covariates and exposures of i
's friends (all W.j
,A.j
such that j
is in i
's friend set F.i
).
The summary measures (sW.i
,sA.i
) are defined simultaneously for all i
with functions
def.sW
and def.sA
.
It is assumed that (sW.i
,sA.i
) have the same dimensionality across i
. The function eval.summaries
can be used for evaluating these summary measures. All estimation is performed by calling the tmlenet
function.
The vector of friends F.i
can be specified either as a single column NETIDnode
in the input data (where each F.i
is a string
of friend IDs or friend row numbers delimited by the character separator sep
) or as a separate input matrix NETIDmat
of network IDs
(where each row is a vector of friend IDs or friend row numbers).
Specifying the network as a matrix generally results in significant improvements to run time.
Routines
The following routines will be generally invoked, in the same order as presented below.
def.sW
This is the first part of the two part specification of the structural equation model for the outcome
Y
. Defines the (multivariate) baselinecovariatebased summary measure functions that will be later applied to the input data to derive the (multivariate) summary measuressW
. Each componentsW[j]
ofsW
is defined by an R expression that takes as its input unit's baseline covariates and the baseline covariates of unit's friends. Each argument passed todef.sW
is considered a separate summary measure, with thej
th argument defining thej
the summary measuresW[j]
and the name of thej
th argument defining the name of the summary measuresW[j]
. The arguments ofdef.sW
can be either named, unnamed or a mixture of both. When the argumentj
is unnamed, the summary measure name forsW[j]
is created automatically.Each summary measure is defined either by an evaluable R expressions or by a string containing an evaluable R expression. These expressions can use a special doublesquarebracket subsetting operator
"Var[[index]]"
, which enables referencing the variableVar
values of unit's friends. For example,Var[[1]]
will evaluate to a onedimensional vector of summary measures of lengthnrow(data)
, where for each row from the inputdata
, this summary measure will contain theVar
value of the unit's first friend. The ordering of friends is determined by the ordering of friend IDs specified in the network input. In cases when the unit doesn't have any friends, its corresponding value ofVar[[1]]
will evaluate toNA
by default. However, all suchNA
's can be replaced by 0's by passingreplaceNAw0 = TRUE
as an additional argument todef.sW
. One can also use vectors for indexing friend variableVar
values inVar[[index]]
. For example,Var[[1:Kmax]]
will evaluate to aKmax
dimensional summary measure, which will be a matrix withnrow(data)
rows andKmax
columns, where the first column will evaluate toVar[[1]]
, the second toVar[[2]]
, and so on, up to the last column evaluating toVar[[Kmax]]
. Note thatKmax
is a special reserved constant that can be used inside the network indexing operators. It is set to the highest number of friends among all units in the inputdata
and it is specified by the user input argumentKmax
. Seedef.sW
manual for various examples of summary measures that use the network indexing operators.def.sA
Defines treatment summary measures
sA
that can be functions of each unit's exposure & baseline covariates, as well the exposures and baseline covariates of unit's friends. This is the second part of the two part specification of the structural equation model for the outcomeY
. The syntax is identical todef.sW
function, except thatdef.sA
can consists of functions of baseline covariates as well as the exposureAnode
.eval.summaries
A convenience function that can be used for validating and evaluating the userspecified summary measures. Takes the input dataset and evaluates the summary measures based on objects previously defined with function calls
def.sW
anddef.sA
. Note that this function is called automatically by thetmlenet
function and does not need to be called by the user prior to callingtmlenet
.tmlenet
Performs estimation of the causal effect of interest using the observed input
data
, the intervention of interest, the network information and the previously defined summary measuressW
,sA
.
Datasets
To learn more about the type of data input required by tmlenet
, see the following example datasets:

df_netKmax2

df_netKmax6

NetInd_mat_Kmax6
Updates
Check for updates and report bugs at http://github.com/osofr/tmlenet.