Estimate Average Network Effects For Arbitrary (Stochastic) Interventions

Description

Estimate the average network effect among dependent units with known network structure (in presence of interference and/or spillover) using TMLE (targeted maximum likelihood estimation), IPTW (Horvitz-Thompson or the inverse-probability-of-treatment) and GCOMP (parametric G-computation formula).

Usage

1
2
3
4
5
6
7
tmlenet(DatNet.ObsP0, data, Kmax, sW, sA, Anode, Ynode, f_gstar1,
  Qform = NULL, hform.g0 = NULL, hform.gstar = NULL, NETIDmat = NULL,
  IDnode = NULL, NETIDnode = NULL, verbose = getOption("tmlenet.verbose"),
  optPars = list(alpha = 0.05, lbound = 0.005, family = "binomial", n_MCsims =
  10, runTMLE = c("tmle.intercept", "tmle.covariate"), YnodeDET = NULL, f_gstar2
  = NULL, sep = " ", f_g0 = NULL, h_g0_SummariesModel = NULL,
  h_gstar_SummariesModel = NULL))

Arguments

DatNet.ObsP0

Instance of class DatNet.sWsA returned under the same name by the eval.summaries function. Stores the evaluated summary measures (sW,sA) for the observed data, along with the network information. When this argument is specified, the following arguments no longer need to be provided: data, Kmax, sW, sA, IDnode, NETIDnode, optPars$sep, NETIDmat.

data

Observed data, a data.frame with named columns, containing the baseline covariates, exposures (Anode), the outcomes (Ynode) and possibly the network column (NETIDnode), where network is specified by a vector of strings of friend IDs, each string using optPars$sep character to separate different friend IDs (default is optPars$sep=' ').

Kmax

Integer constant specifying the maximal number of friends for any observation in the input data data.frame.

sW

Summary measures constructed from baseline covariates alone. This must be an object of class DefineSummariesClass that is returned by calling the function def.sW.

sA

Summary measures constructed from exposures Anode and baseline covariates. This must be an object of class DefineSummariesClass that is returned by calling the function def.sW.

Anode

Exposure (treatment) variable name (column name in data); exposures can be either binary, categorical or continuous.

Ynode

Outcome variable name (column name in data), assumed normalized between 0 and 1. This can instead be specified on the left-side of the regression formula in argument Qform.

f_gstar1

Either a function or a vector of counterfactual exposures. If a function, must return a vector of counterfactual exposures evaluated based on the summary measures matrix (sW,sA) passed as a named argument "data", therefore, the function in f_gstar1 must have a named argument "data" in its signature. The interventions defined by f_gstar1 can be static, dynamic or stochastic. If f_gstar1 is specified as a vector, it must be of length nrow(data) or 1 (constant treatment assigned to all observations). See Details below and Examples in "EQUIVALENT WAYS OF SPECIFYING INTERVENTION f_gstar1" for demonstration.

Qform

Regression formula for outcome in Ynode, when omitted (Ynode=NULL), the outcome variable is regressed on all variables defined in sW and sA. See Details.

hform.g0

Regression formula for estimating the conditional density of P(sA | sW) under g0 (the observed exposure mechanism), when omitted, this regression is defined by sA~sW where sA are all summary measures defined by argument sA and sW are all baseline summary measures defined by argument sW.

hform.gstar

Regression formula for estimating the conditional density P(sA | sW) under interventions f_gstar1 or f_gstar2. When omitted, the same regression formula as in hform.g0 will be used for hform.gstar. See Details.

NETIDmat

Network specification via matrix of friend IDs (ncol=Kmax, nrow=nrow(data)), where each row i is a vector of i's friends IDs or i's friends row numbers in data if IDnode=NULL. See Details.

IDnode

Subject identifier variable in the input data, if not supplied the network string in NETIDnode is assumed to be indexing the row numbers in the input data

NETIDnode

Network specification by a column name in input data consisting of strings that identify the unit's friends by their IDs or their row numbers (two friends are separated by space, e.g., "1 2"; unit with no friends should have an empty "" string). See Details.

verbose

Set to TRUE to print messages on status and information to the console. Turn this on by default using options(tmlenet.verbose=TRUE).

optPars

A named list of additional optional parameters to be passed to tmlenet, such as alpha, lbound, family, n_MCsims, runTMLE, YnodeDET, f_gstar2, sep, f_g0, h_g0_SummariesModel and h_gstar_SummariesModel. See Details below for the description of each parameter.

Value

A named list with 3 items containing the estimation results for:

  • EY_gstar1 - estimates of the mean counterfactual outcome under (stochastic) intervention function f_gstar1 (E_{g^*_1}[Y]).

  • EY_gstar2 - estimates of the mean counterfactual outcome under (stochastic) intervention function f_gstar2 (E_{g^*_2}[Y]), NULL if f_gstar2 not specified.

  • ATE - additive treatment effect (E_{g^*_1}[Y] - E_{g^*_2}[Y]) under interventions f_gstar1 vs. in f_gstar2, NULL if f_gstar2 not specified.

Each list item above is itself a list containing the items:

  • estimates - various estimates of the target parameter (network population counterfactual mean under (stochastic) intervention).

  • vars - the asymptotic variance estimates, for IPTW and TMLE.

  • CIs - CI estimates at alpha level, for IPTW and TMLE.

  • other.vars - Placeholder for future versions.

  • h_g0_SummariesModel - The model fits for P(sA|sW) under observed exposure mechanism g0. This is an object of SummariesModel R6 class.

  • h_gstar_SummariesModel - The model fits for P(sA|sW) under intervention f_gstar1 or f_gstar2. This is an object of SummariesModel R6 class.

Currently implemented estimators are:

  • tmle - Either weighted regression intercept-based TMLE (tmle.intercept - the default) with weights defined by the IPTW weights h_gstar/h_gN or covariate-based unweighted TMLE (tmle.covariate) that uses the IPTW weights as a covariate h_gstar/h_gN.

  • h_iptw - Efficient IPTW based on weights h_gstar/h_gN.

  • gcomp - Parametric G-computation formula substitution estimator.

Details

Note that in case when both arguments NETIDnode and NETIDmat are left unspecified the input data are assumed independent, i.e., no network dependency between the observations. All inference will be performed based on the i.i.d. efficient influence curve for the target parameter (E_{g^*_1}[Y]).

Also note that the ordering of the friends IDs in NETIDnode or NETIDmat is unimportant.

A special non-negative-integer-valued variable nF is automatically calculated each time tmlenet or eval.summaries functions are called. nF contains the total number of friends for each observation and it is always added as an additional column to the matrix of the baseline-covariates-based summary measures sWmat. The variable nF can be used in the same ways as any of the column names in the input data frame data. In particular, the name nF can be used inside the summary measure expressions (calls to functions def.sW and def.sA) and inside any of the regression formulas (Qform, hform.g0, hform.gstar).

The regression formalas in Qform, hform.g0 and hform.gstar can include any summary measures names defined in sW and sA, referenced by their individual variable names or by their aggregate summary measure names. For example, hform.g0 = "netA ~ netW" is equivalent to hform.g0 = "A + A_netF1 + A_netF2 ~ W + W_netF1 + W_netF2" for sW,sA summary measures defined by def.sW(netW=W[[0:2]]) and def.sA(netA=A[[0:2]]).

Additional parameters

Some of the parameters that control the estimation in tmlenet can be set by calling the function tmlenet_options.

Additional parameters can be also specified as a named list optPars argument of the tmlenet function. The items that can be specified in optPars are:

  • alpha - alpha-level for CI calculation (0.05 for 95

  • lbound - One value for symmetrical bounds on P(sW | sW).

  • family - Family specification for regression models, defaults to binomial (CURRENTLY ONLY BINOMIAL FAMILY IS IMPLEMENTED).

  • n_MCsims - Number of Monte-Carlo simulations performed, each of sample size nrow(data), for generating new exposures under f_gstar1 or f_gstar2 (if specified) or f_g0 (if specified). These newly generated exposures are utilized when fitting the conditional densities P(sA|sW) and when evaluating the substitution estimators GCOMP and TMLE under stochastic interventions f_gstar1 or f_gstar2.

  • runTMLE - Choose which of the two TMLEs to run, "tmle.intercept" or "tmle.covariate". The default is "tmle.intercept".

  • YnodeDET - Optional column name for indicators of deterministic values of outcome Ynode, coded as (TRUE/FALSE) or (1/0); observations with YnodeDET=TRUE/1 are assumed to have constant value for their Ynode.

  • f_gstar2 - Either a function or a vector of counterfactual exposure assignments. Used for estimating contrasts (average treatment effect) for two interventions, if omitted, only the average counterfactual outcome under intervention f_gstar1 is estimated. The requirements for f_gstar2 are identical to those for f_gstar1.

  • sep - A character separating friend indices for the same observation in NETIDnode.

  • f_g0 - A function for generating true treatment mechanism under observed Anode, if known (for example in a randomized trial). This is used for estimating P(sA|sW) under g0 by sampling large vector of Anode (of length nrow(data)*n_MCsims) from f_g0 function;

  • h_g0_SummariesModel - Previously fitted model for P(sA|sW) under observed exposure mechanism g0, returned by the previous runs of the tmlenet function. This has to be an object of SummariesModel R6 class. When this argument is specified, all predictions P(sA=sa|sW=sw) under g0 will be based on the model fits provided by this argument.

  • h_gstar_SummariesModel - Previously fitted model for P(sA|sW) under (stochastic) intervention specified by f_gstar1 or f_gstar2. Also an object of SummariesModel R6 class. When this argument is specified, the predictions P(sA=sa|sW=sw) under f_gstar1 or f_gstar2 will be based on the model fits provided by this argument.

Specifying the counterfactual intervention function (f_gstar1 and optPars$f_gstar2)

The functions f_gstar1 and f_gstar2 can only depend on variables specified by the combined matrix of summary measures (sW,sA), which is passed using the argument data. The functions should return a vector of length nrow(data) of counterfactual treatments for observations in the input data.

Specifying the Network of Friends

The network of friends (connections) for observations in the input data can be specified in two alternative ways, using either NETIDnode or NETIDmat input arguments.

NETIDnode - The first (slower) method uses a vector of strings in data[, NETIDnode], where each string i must contain the space separated IDs or row numbers of all units in data thought to be connected to observation i (friends of unit i);

NETIDmat - An alternative (and faster) method is to pass a matrix with Kmax columns and nrow(data) rows, where each row NETIDmat[i,] is a vector of observation i's friends' IDs or i's friends' row numbers in data if IDnode=NULL. If observation i has fewer than Kmax friends, the remainder of NETIDmat[i,] must be filled with NAs. Note that the ordering of friend indices is irrelevant.

IPTW estimator

**********************************************************************

  • As described in the following section, the first step is to construct an estimator P_{g_N}(sA | sW) for the common (in i) conditional density P_{g_0}(sA | sW) for common (in i) unit-level summary measures (sA,sW).

  • The same fitting algorithm is applied to construct an estimator P_{g^*_N}(sA^* | sW^*) of the common (in i) conditional density P_{g^*}(sA^* | sW^*) for common (in i) unit-level summary measures (sA^*,sW^*) implied by the user-supplied stochastic intervention f_gstar1 or f_gstar2 and the observed distribution of W.

  • These two density estimators form the basis of the IPTW estimator, which is evaluated at the observed N data points O_i=(sW_i, sA_i, Y_i), i=1,...,N and is given by

    ψ^{IPTW}_n = ∑_{i=1,...,N}{Y_i \frac{P_{g^*_N}(sA^*=sA_i | sW=sW_i)}{P_{g_N}(sA=sA_i | sW=sW_i)}}.

GCOMP estimator

**********************************************************************

TMLE estimator

**********************************************************************

Modeling P(sA|sW) for summary measures (sA,sW)

**********************************************************************

Non-parametric estimation of the common unit-level multivariate joint conditional probability model P_g0(sA|sW), for unit-level summary measures (sA,sW) generated from the observed exposures and baseline covariates (A,W)=(A_i,W_i : i=1,...,N) (their joint density given by g_0(A|W)Q(W)), is performed by first constructing the dataset of N summary measures, (sA_i,sW_i : i=1,...,N), and then fitting the usual i.i.d. MLE for the common density P_g0(sA|sW) based on the pooled N sample of these summary measures.

Note that sA can be multivariate and any of its components sA[j] can be either binary, categorical or continuous. The joint probability model for P(sA|sA) = P(sA[1],...,sA[k]|sA) can be factorized as P(sA[1]|sA) * P(sA[2]|sA, sA[1]) * ... * P(sA[k]|sA, sA[1],...,sA[k-1]), where each of these conditional probability models is fit separately, depending on the type of the outcome variable sA[j].

If sA[j] is binary, the conditional probability P(sA[j]|sW,sA[1],...,sA[j-1]) is evaluated via logistic regression model. When sA[j] is continuous (or categorical), its range will be fist partitioned into K bins and the corresponding K bin indicators (B_1,...,B_K), where each bin indicator B_j is then used as an outcome in a separate logistic regression model with predictors given by sW, sA[1],...,sA[k-1]. Thus, the joint probability P(sA|sW) is defined by such a tree of binary logistic regressions.

For simplicity, we now suppose sA is continuous and univariate and we describe here an algorithm for fitting P_{g_0}(sA | sW) (the algorithm for fitting P_{g^*}(sA^* | sW^*) is equivalent, except that exposure A is replaced with exposure A^* generated under f_gstar1 or f_gstar2 and the predictors sW from the regression formula hform.g0 are replaced with predictors sW^* specified by the regression formula hform.gstar).

  1. Generate a dataset of N observed continuous summary measures (sa_i:i=1,...,N) from observed ((a_i,w_i):i=1,...,N). Let sa\insa_i:i=1,...,M.

  2. Divide the range of sA values into intervals S=(i_1,...,i_M,i_M+1) so that any observed data point sa_i belongs to one interval in S, namely, for each possible value sa of sA there is k\in1,...,M, such that, i_k < sa <= i_k+1. Let the mapping B(sa)\in1,...,M denote a unique interval in S for sa, such that, i_B(sa) < sa <= i_B(sa)+1. Let bw_B(sa):=i_B(sa)+1-i_B(sa) be the length of the interval (bandwidth) (i_B(sa),i_B(sa)+1). Also define the binary indicators b_1,...,b_M, where b_j:=I(B(sa)=j), for all j <= B(sa) and b_j:=NA for all j>B(sa). That is we set b_j to missing ones the indicator I(B(sa)=j) jumps from 0 to 1. Now let sA denote the random variable for the observed summary measure for one unit and denote by (B_1,...,B_M) the corresponding random indicators for sA defined as B_j := I(B(sA) = j) for all j <= B(sA) and B_j:=NA for all j>B(sA).

  3. For each j=1,...,M, fit the logistic regression model for the conditional probability P(B_j = 1 | B_j-1=0, sW), i.e., at each j this is defined as the conditional probability of B_j jumping from 0 to 1 at bin j, given that B_j-1=0 and each of these logistic regression models is fit only among the observations that are still at risk of having B_j=1 with B_j-1=0.

  4. Normalize the above conditional probability of B_j jumping from 0 to 1 by its corresponding interval length (bandwidth) bw_j to obtain the discrete conditional hazards h_j(sW):=P(B_j = 1 | (B_j-1=0, sW) / bw_j, for each j. For the summary measure sA, the above conditional hazard h_j(sW) is equal to P(sA \in (i_j,i_j+1) | sA>=i_j, sW), i.e., this is the probability that sA falls in the interval (i_j,i_j+1), conditional on sW and conditional on the fact that sA does not belong to any intervals before j.

  5. Finally, for any given data-point (sa,sw), evaluate the discretized conditional density for P(sA=sa|sW=sw) by first evaluating the interval number k=B(sa)\in1,...,M for sa and then computing \prodj=1,...,k-11-h_j(sW))*h_k(sW) which is equivalent to the joint conditional probability that sa belongs to the interval (i_k,i_k+1) and does not belong to any of the intervals 1 to k-1, conditional on sW.

The evaluation above utilizes a discretization of the fact that any continuous density f of random variable X can be written as f_X(x)=S_X(x)*h_X(x), for a continuous density f of X where S_X(x):=P(X>x) is the survival function for X, h_X=P(X>x|X>=x) is the hazard function for X; as well as the fact that the discretized survival function S_X(x) can be written as a of the hazards for s<x: S_X(x)=\prods<xh_X(x).

Three methods for defining bin (interval) cuttoffs for a continuous one-dimenstional summary measure sA[j]

**********************************************************************

There are 3 alternative methods to defining the bin cutoffs S=(i_1,...,i_M,i_M+1) for a continuous summary measure sA. The choice of which method is used along with other discretization parameters (e.g., total number of bins) is controlled via the tmlenet_options() function. See ?tmlenet_options argument bin.method for additional details.

Approach 1 (equal.len): equal length, default.

*********************

The bins are defined by splitting the range of observed sA (sa_1,...,sa_n) into equal length intervals. This is the dafault discretization method, set by passing an argument bin.method="equal.len" to tmlenet_options function prior to calling tmlenet(). The intervals will be defined by splitting the range of (sa_1,...,sa_N) into nbins number of equal length intervals, where nbins is another argument of tmlenet_options() function. When nbins=NA (the default setting) the actual value of nbins is computed at run time by taking the integer value (floor) of n/maxNperBin, for n - the total observed sample size and maxNperBin=1000 - another argument of tmlenet_options() with the default value 1,000.

Approach 2 (equal.mass): data-adaptive equal mass intervals.

*********************

The intervals are defined by splitting the range of sA into non-equal length data-adaptive intervals that ensures that each interval contains around maxNperBin observations from (sa_j:j=1,...,N). This interval definition approach can be selected by passing an argument bin.method="equal.mass" to tmlenet_options() prior to calling tmlenet(). The method ensures that an approximately equal number of observations will belong to each interval, where that number of observations for each interval is controlled by setting maxNperBin. The default setting is maxNperBin=1000 observations per interval.

Approach 3 (dhist): combination of 1 & 2.

*********************

The data-adaptive approach dhist is a mix of Approaches 1 & 2. See Denby and Mallows "Variations on the Histogram" (2009)). This interval definition method is selected by passing an argument bin.method="dhist" to tmlenet_options() prior to calling tmlenet().

See Also

tmlenet-package for the general overview of the package, def.sW for defining the summary measures, eval.summaries for evaluation and validation of the summary measures, and df_netKmax2/df_netKmax6/NetInd_mat_Kmax6 for examples of network datasets.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
#***************************************************************************************
data(df_netKmax6) # Load observed data
data(NetInd_mat_Kmax6) # Load the network ID matrix
Kmax <- 6 # Max number of friends in the network
#***************************************************************************************

#***************************************************************************************
# EXAMPLES OF INTERVENTION FUNCTIONS:
#***************************************************************************************
# Returns a function that will sample A with probability x:=P(A=1))
make_f.gstar <- function(x, ...) {
  eval(x)
  f.A_x <- function(data, ...){
    rbinom(n = nrow(data), size = 1, prob = x[1])
  }
  return(f.A_x)
}
# Deterministic f_gstar setting every A=0:
f.A_0 <- make_f.gstar(x = 0)
# Deterministic f_gstar setting every A=1:
f.A_1 <- make_f.gstar(x = 1)
# Stochastically sets (100*x)% of population to A=1 with probability 0.2:
f.A_.2 <- make_f.gstar(x = 0.2)

#***************************************************************************************
# DEFINING SUMMARY MEASURES:
#***************************************************************************************
def_sW <- def.sW(netW2 = W2[[1:Kmax]]) +
          def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE)

def_sA <- def.sA(sum.netAW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE) +
          def.sA(netA = A[[0:Kmax]])

#***************************************************************************************
# EVALUATING SUMMARY MEASURES BASED ON INPUT DATA:
#***************************************************************************************
# A helper function that can pre-evaluate the summary measures on (O)bserved data, 
# given one of two ways of specifying the network:
res <- eval.summaries(sW = def_sW, sA = def_sA,  Kmax = 6, data = df_netKmax6,
                      NETIDmat = NetInd_mat_Kmax6, verbose = TRUE)

#***************************************************************************************
# Specifying the clever covariate regressions hform.g0 and hform.gstar:
# Left side can consist of any summary names defined by def.sA (as linear terms)
# Right side can consist of any summary names defined by def.sW (as linear terms) & 'nF'
#***************************************************************************************
hform.g01 <- "netA ~ netW2 + sum.netW3 + nF"
hform.gstar1 <- "netA ~ sum.netW3"

# alternatives:
hform.g02 <- "netA + sum.netAW2 ~ netW2 + sum.netW3 + nF"
hform.g03 <- "sum.netAW2 ~ netW2 + sum.netW3"

#***************************************************************************************
# Specifying the outcome regression Qform:
# Left side is ignored (with a warning if not equal to Ynode)
# Right side can by any summary names defined by def.sW, def.sA & 'nF'
#***************************************************************************************
Qform1 <- "Y ~ sum.netW3 + sum.netAW2"
Qform2 <- "Y ~ netA + netW + nF"
Qform3 <- "blah ~ netA + netW + nF"

#***************************************************************************************
# Estimate mean population outcome under deterministic intervention A=0 with 6 friends:
# Estimation with regression formulas.
#***************************************************************************************
# Note that Ynode is optional when Qform is specified;
options(tmlenet.verbose = FALSE) # set to TRUE to print status messages
res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
                    Anode = "A", f_gstar1 = 0L,
                    Qform = "Y ~ sum.netW3 + sum.netAW2",
                    hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
                    hform.gstar = "netA ~ sum.netW3",
                    IDnode = "IDs", NETIDnode = "Net_str",
                    optPars = list(n_MCsims = 1))

res_K6_1$EY_gstar1$estimates
res_K6_1$EY_gstar1$vars
res_K6_1$EY_gstar1$CIs

#***************************************************************************************
# Run exactly the same estimators as above, 
# but using the output "DatNet.ObsP0" of eval.summaries as input to tmlenet:
#***************************************************************************************
res_K6_1b <- tmlenet(DatNet.ObsP0 = res$DatNet.ObsP0,
                    Anode = "A", f_gstar1 = 0L,
                    Qform = "Y ~ sum.netW3 + sum.netAW2",
                    hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
                    hform.gstar = "netA ~ sum.netW3",
                    optPars = list(n_MCsims = 1))

res_K6_1b$EY_gstar1$estimates
res_K6_1b$EY_gstar1$vars
res_K6_1b$EY_gstar1$CIs

#***************************************************************************************
# Same as above but for covariate-based TMLE.
#***************************************************************************************
res_K6_2 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
                    Anode = "A", f_gstar1 = 0L,
                    Qform = "Y ~ sum.netW3 + sum.netAW2",
                    hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
                    hform.gstar = "netA ~ sum.netW3",
                    IDnode = "IDs", NETIDnode = "Net_str",
                    optPars = list(runTMLE = "tmle.covariate", n_MCsims = 1))

res_K6_2$EY_gstar1$estimates
res_K6_2$EY_gstar1$vars
res_K6_2$EY_gstar1$CIs

#***************************************************************************************
# SPECIFYING THE NETWORK AS A MATRIX OF FRIEND ROW NUMBERS:
#***************************************************************************************
net_ind_obj <- simcausal::NetIndClass$new(nobs = nrow(df_netKmax6), Kmax = Kmax)
# generating the network matrix from input data:
NetInd_mat <- net_ind_obj$makeNetInd.fromIDs(Net_str = df_netKmax6[, "Net_str"],
                                              IDs_str = df_netKmax6[, "IDs"])$NetInd
nF <- net_ind_obj$nF # number of friends

data(NetInd_mat_Kmax6)
all.equal(NetInd_mat, NetInd_mat_Kmax6) # TRUE

print(head(NetInd_mat))
print(head(nF))
print(all.equal(df_netKmax6[,"nFriends"], nF))

res_K6_net1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
                        Anode = "A", f_gstar1 = f.A_0,
                        Qform = "Y ~ sum.netW3 + sum.netAW2",
                        hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
                        hform.gstar = "netA ~ sum.netW3",
                        NETIDmat = NetInd_mat,
                        optPars = list(runTMLE = "tmle.intercept", n_MCsims = 1))

all.equal(res_K6_net1$EY_gstar1$estimates, res_K6_1$EY_gstar1$estimates)
all.equal(res_K6_net1$EY_gstar1$vars, res_K6_1$EY_gstar1$vars)
all.equal(res_K6_net1$EY_gstar1$CIs, res_K6_1$EY_gstar1$CIs)

#***************************************************************************************
# EQUIVALENT WAYS OF SPECIFYING INTERVENTIONS f_gstar1/f_gstar2.
# LOWERING THE DIMENSIONALITY OF THE SUMMARY MEASURES.
#***************************************************************************************
def_sW <- def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE)
def_sA <- def.sA(sum.netAW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE)

# can define intervention by function f.A_0 that sets everyone's A to constant 0:
res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
                    Anode = "A", Ynode = "Y", f_gstar1 = f.A_0,
                    NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates

# equivalent way to define intervention f.A_0 is to just set f_gstar1 to 0:
res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
                    Anode = "A", Ynode = "Y", f_gstar1 = 0L,
                    NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates

# or set f_gstar1 to a vector of 0's of length nrow(data):
res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
                    Anode = "A", Ynode = "Y", f_gstar1 = rep_len(0L, nrow(df_netKmax6)),
                    NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates

#***************************************************************************************
# EXAMPLE WITH SIMULATED DATA FOR AT MOST 2 FRIENDS AND 1 BASELINE COVARIATE W1
#***************************************************************************************
data(df_netKmax2)
head(df_netKmax2)
Kmax <- 2
# Define the summary measures:
def_sW <- def.sW(W1[[0:Kmax]])
def_sA <- def.sA(A[[0:Kmax]])
# Define the network matrix:
net_ind_obj <- simcausal::NetIndClass$new(nobs = nrow(df_netKmax2), Kmax = Kmax)
NetInd_mat <- net_ind_obj$makeNetInd.fromIDs(Net_str = df_netKmax2[, "Net_str"],
                              IDs_str = df_netKmax2[, "IDs"])$NetInd

#***************************************************************************************
# Mean population outcome under stochastic intervention P(A=1)=0.2
#***************************************************************************************
# Evaluates the target parameter by sampling exposures A from f_gstar1;
# Setting optPars$n_MCsims=100 means that A will be sampled 1000 times (
# for a total sample size of nrow(data)*n_MCsims under f_gstar1)
options(tmlenet.verbose = TRUE)
res_K2_1 <- tmlenet(data = df_netKmax2, Kmax = Kmax, sW = def_sW, sA = def_sA,
                    Anode = "A", Ynode = "Y", f_gstar1 = f.A_.2,
                    NETIDmat = NetInd_mat, optPars = list(n_MCsims = 100))
res_K2_1$EY_gstar1$estimates
res_K2_1$EY_gstar1$vars
res_K2_1$EY_gstar1$CIs

#***************************************************************************************
# Estimating the average treatment effect (ATE) for two static interventions:
# f_gstar1 sets everyone A=1 vs f_gstar2 sets everyone A=0;
#***************************************************************************************
res_K2_2 <- tmlenet(data = df_netKmax2, Kmax = Kmax, sW = def_sW, sA = def_sA,
                    Anode = "A", Ynode = "Y",
                    f_gstar1 = 1,
                    NETIDmat = NetInd_mat,
                    optPars = list(f_gstar2 = 0, n_MCsims = 1))
names(res_K2_2)

# Estimates under f_gstar1:
res_K2_2$EY_gstar1$estimates
res_K2_2$EY_gstar1$vars
res_K2_2$EY_gstar1$CIs

# Estimates under f_gstar2:
res_K2_2$EY_gstar2$estimates
res_K2_2$EY_gstar2$vars
res_K2_2$EY_gstar2$CIs

# ATE estimates for f_gstar1-f_gstar2:
res_K2_2$ATE$estimates
res_K2_2$ATE$vars
res_K2_2$ATE$CIs

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.