Targeted Maximum Likelihood Estimation for Network Data

Share:

Description

The tmlenet R package implements the Targeted Maximum Likelihood Estimation (TMLE) of causal effects under single time point stochastic interventions in network data. The package also implements the Horvitz-Thompson estimator for networks (IPTW) and the parametric g-computation formula estimator. The inference for the TMLE is based on its asymptotic normality and the efficient influence curve for dependent data. The inference for IPTW is based on its corresponding influence curve for dependent data.

Details

The input data structure consists of rows of unit-specific observations, with each row i represented by random variables (F.i,W.i,A.i,Y.i), where F.i is a vector of "friend IDs" of unit i (also referred to as i's "network"), W.i is a vector of i's baseline covariates, A.i is i's exposure (either binary, categorical or continuous) and Y.i is i's binary outcome. Each exposure A.i depends on (possibly multivariate) baseline summary measure(s) sW.i, where sW.i can be any user-specified function of i's baseline covariates W.i and the baseline covariates of i's friends in set F.i (all W.j such that j is in F.i). Similarly, each outcome Y.i depends on sW.i and (possibly multivariate) summary measure(s) sA.i, where sA.i can be any user-specified function of i's baseline covariates and exposure (W.i,A.i) and the baseline covariates and exposures of i's friends (all W.j,A.j such that j is in i's friend set F.i).

The summary measures (sW.i,sA.i) are defined simultaneously for all i with functions def.sW and def.sA. It is assumed that (sW.i,sA.i) have the same dimensionality across i. The function eval.summaries can be used for evaluating these summary measures. All estimation is performed by calling the tmlenet function. The vector of friends F.i can be specified either as a single column NETIDnode in the input data (where each F.i is a string of friend IDs or friend row numbers delimited by the character separator sep) or as a separate input matrix NETIDmat of network IDs (where each row is a vector of friend IDs or friend row numbers). Specifying the network as a matrix generally results in significant improvements to run time.

Routines

The following routines will be generally invoked, in the same order as presented below.

def.sW

This is the first part of the two part specification of the structural equation model for the outcome Y. Defines the (multivariate) baseline-covariate-based summary measure functions that will be later applied to the input data to derive the (multivariate) summary measures sW. Each component sW[j] of sW is defined by an R expression that takes as its input unit's baseline covariates and the baseline covariates of unit's friends. Each argument passed to def.sW is considered a separate summary measure, with the jth argument defining the jthe summary measure sW[j] and the name of the jth argument defining the name of the summary measure sW[j]. The arguments of def.sW can be either named, unnamed or a mixture of both. When the argument j is unnamed, the summary measure name for sW[j] is created automatically.

Each summary measure is defined either by an evaluable R expressions or by a string containing an evaluable R expression. These expressions can use a special double-square-bracket subsetting operator "Var[[index]]", which enables referencing the variable Var values of unit's friends. For example, Var[[1]] will evaluate to a one-dimensional vector of summary measures of length nrow(data), where for each row from the input data, this summary measure will contain the Var value of the unit's first friend. The ordering of friends is determined by the ordering of friend IDs specified in the network input. In cases when the unit doesn't have any friends, its corresponding value of Var[[1]] will evaluate to NA by default. However, all such NA's can be replaced by 0's by passing replaceNAw0 = TRUE as an additional argument to def.sW. One can also use vectors for indexing friend variable Var values in Var[[index]]. For example, Var[[1:Kmax]] will evaluate to a Kmax-dimensional summary measure, which will be a matrix with nrow(data) rows and Kmax columns, where the first column will evaluate to Var[[1]], the second to Var[[2]], and so on, up to the last column evaluating to Var[[Kmax]]. Note that Kmax is a special reserved constant that can be used inside the network indexing operators. It is set to the highest number of friends among all units in the input data and it is specified by the user input argument Kmax. See def.sW manual for various examples of summary measures that use the network indexing operators.

def.sA

Defines treatment summary measures sA that can be functions of each unit's exposure & baseline covariates, as well the exposures and baseline covariates of unit's friends. This is the second part of the two part specification of the structural equation model for the outcome Y. The syntax is identical to def.sW function, except that def.sA can consists of functions of baseline covariates as well as the exposure Anode.

eval.summaries

A convenience function that can be used for validating and evaluating the user-specified summary measures. Takes the input dataset and evaluates the summary measures based on objects previously defined with function calls def.sW and def.sA. Note that this function is called automatically by the tmlenet function and does not need to be called by the user prior to calling tmlenet.

tmlenet

Performs estimation of the causal effect of interest using the observed input data, the intervention of interest, the network information and the previously defined summary measures sW, sA.

Datasets

To learn more about the type of data input required by tmlenet, see the following example datasets:

  • df_netKmax2

  • df_netKmax6

  • NetInd_mat_Kmax6

Updates

Check for updates and report bugs at http://github.com/osofr/tmlenet.

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.