Estimate Average Network Effects For Arbitrary (Stochastic) Interventions
Description
Estimate the average network effect among dependent units with known network structure (in presence of interference and/or spillover) using TMLE (targeted maximum likelihood estimation), IPTW (HorvitzThompson or the inverseprobabilityoftreatment) and GCOMP (parametric Gcomputation formula).
Usage
1 2 3 4 5 6 7  tmlenet(DatNet.ObsP0, data, Kmax, sW, sA, Anode, Ynode, f_gstar1,
Qform = NULL, hform.g0 = NULL, hform.gstar = NULL, NETIDmat = NULL,
IDnode = NULL, NETIDnode = NULL, verbose = getOption("tmlenet.verbose"),
optPars = list(alpha = 0.05, lbound = 0.005, family = "binomial", n_MCsims =
10, runTMLE = c("tmle.intercept", "tmle.covariate"), YnodeDET = NULL, f_gstar2
= NULL, sep = " ", f_g0 = NULL, h_g0_SummariesModel = NULL,
h_gstar_SummariesModel = NULL))

Arguments
DatNet.ObsP0 
Instance of class 
data 
Observed data, a 
Kmax 
Integer constant specifying the maximal number of friends for any observation in the input 
sW 
Summary measures constructed from baseline covariates alone. This must be an object of class

sA 
Summary measures constructed from exposures 
Anode 
Exposure (treatment) variable name (column name in 
Ynode 
Outcome variable name (column name in 
f_gstar1 
Either a function or a vector of counterfactual exposures. If a function, must return
a vector of counterfactual exposures evaluated based on the summary measures matrix ( 
Qform 
Regression formula for outcome in 
hform.g0 
Regression formula for estimating the conditional density of P( 
hform.gstar 
Regression formula for estimating the conditional density P( 
NETIDmat 
Network specification via matrix of friend IDs ( 
IDnode 
Subject identifier variable in the input data, if not supplied the network string in

NETIDnode 
Network specification by a column name in input 
verbose 
Set to 
optPars 
A named list of additional optional parameters to be passed to 
Value
A named list with 3 items containing the estimation results for:

EY_gstar1
 estimates of the mean counterfactual outcome under (stochastic) intervention functionf_gstar1
(E_{g^*_1}[Y]). 
EY_gstar2
 estimates of the mean counterfactual outcome under (stochastic) intervention functionf_gstar2
(E_{g^*_2}[Y]),NULL
iff_gstar2
not specified. 
ATE
 additive treatment effect (E_{g^*_1}[Y]  E_{g^*_2}[Y]) under interventionsf_gstar1
vs. inf_gstar2
,NULL
iff_gstar2
not specified.
Each list item above is itself a list containing the items:

estimates
 various estimates of the target parameter (network population counterfactual mean under (stochastic) intervention). 
vars
 the asymptotic variance estimates, for IPTW and TMLE. 
CIs
 CI estimates atalpha
level, for IPTW and TMLE. 
other.vars
 Placeholder for future versions. 
h_g0_SummariesModel
 The model fits for P(sA
sW
) under observed exposure mechanismg0
. This is an object ofSummariesModel
R6 class. 
h_gstar_SummariesModel
 The model fits for P(sA
sW
) under interventionf_gstar1
orf_gstar2
. This is an object ofSummariesModel
R6 class.
Currently implemented estimators are:

tmle
 Either weighted regression interceptbased TMLE (tmle.intercept
 the default) with weights defined by the IPTW weightsh_gstar/h_gN
or covariatebased unweighted TMLE (tmle.covariate
) that uses the IPTW weights as a covariateh_gstar/h_gN
. 
h_iptw
 Efficient IPTW based on weights h_gstar/h_gN. 
gcomp
 Parametric Gcomputation formula substitution estimator.
Details
Note that in case when both arguments NETIDnode
and NETIDmat
are left unspecified the input data are
assumed independent, i.e., no network dependency between the observations.
All inference will be performed based on the i.i.d. efficient influence curve for the target
parameter (E_{g^*_1}[Y]).
Also note that the ordering of the friends IDs in NETIDnode
or NETIDmat
is unimportant.
A special nonnegativeintegervalued variable nF
is automatically calculated each time
tmlenet
or eval.summaries
functions are called. nF
contains the total number of friends for
each observation and it is always added as an additional column to the matrix of the baselinecovariatesbased
summary measures sWmat
. The variable nF
can be used in the same ways as any of the column names in
the input data frame data
. In particular, the name nF
can be used inside the summary measure
expressions (calls to functions def.sW
and def.sA
) and inside any of the regression formulas
(Qform
, hform.g0
, hform.gstar
).
The regression formalas in Qform
, hform.g0
and hform.gstar
can include any summary measures names defined in
sW
and sA
, referenced by their individual variable names or by their aggregate summary measure names.
For example, hform.g0 = "netA ~ netW"
is equivalent to
hform.g0 = "A + A_netF1 + A_netF2 ~ W + W_netF1 + W_netF2"
for sW,sA
summary measures defined by
def.sW(netW=W[[0:2]])
and def.sA(netA=A[[0:2]])
.
Additional parameters
Some of the parameters that control the estimation in tmlenet
can be set by calling the function
tmlenet_options
.
Additional parameters can be also specified as a named list optPars
argument of the
tmlenet
function. The items that can be specified in optPars
are:

alpha
 alphalevel for CI calculation (0.05 for 95 
lbound
 One value for symmetrical bounds on P(sW  sW). 
family
 Family specification for regression models, defaults to binomial (CURRENTLY ONLY BINOMIAL FAMILY IS IMPLEMENTED). 
n_MCsims
 Number of MonteCarlo simulations performed, each of sample sizenrow(data)
, for generating new exposures underf_gstar1
orf_gstar2
(if specified) orf_g0
(if specified). These newly generated exposures are utilized when fitting the conditional densities P(sA
sW
) and when evaluating the substitution estimators GCOMP and TMLE under stochastic interventionsf_gstar1
orf_gstar2
. 
runTMLE
 Choose which of the two TMLEs to run, "tmle.intercept" or "tmle.covariate". The default is "tmle.intercept". 
YnodeDET
 Optional column name for indicators of deterministic values of outcomeYnode
, coded as (TRUE
/FALSE
) or (1
/0
); observations withYnodeDET
=TRUE
/1
are assumed to have constant value for theirYnode
. 
f_gstar2
 Either a function or a vector of counterfactual exposure assignments. Used for estimating contrasts (average treatment effect) for two interventions, if omitted, only the average counterfactual outcome under interventionf_gstar1
is estimated. The requirements forf_gstar2
are identical to those forf_gstar1
. 
sep
 A character separating friend indices for the same observation inNETIDnode
. 
f_g0
 A function for generating true treatment mechanism under observedAnode
, if known (for example in a randomized trial). This is used for estimating P(sA
sW
) underg0
by sampling large vector ofAnode
(of lengthnrow(data)*n_MCsims
) fromf_g0
function; 
h_g0_SummariesModel
 Previously fitted model for P(sA
sW
) under observed exposure mechanismg0
, returned by the previous runs of thetmlenet
function. This has to be an object ofSummariesModel
R6 class. When this argument is specified, all predictions P(sA
=sa
sW
=sw
) underg0
will be based on the model fits provided by this argument. 
h_gstar_SummariesModel
 Previously fitted model for P(sA
sW
) under (stochastic) intervention specified byf_gstar1
orf_gstar2
. Also an object ofSummariesModel
R6 class. When this argument is specified, the predictions P(sA
=sa
sW
=sw
) underf_gstar1
orf_gstar2
will be based on the model fits provided by this argument.
Specifying the counterfactual intervention function (f_gstar1
and optPars$f_gstar2
)
The functions f_gstar1
and f_gstar2
can only depend on variables specified by the combined matrix
of summary measures (sW
,sA
), which is passed using the argument data
. The functions should
return a vector of length nrow(data)
of counterfactual treatments for observations in the input data.
Specifying the Network of Friends
The network of friends (connections) for observations in the input data
can be specified in two
alternative ways, using either NETIDnode
or NETIDmat
input arguments.
NETIDnode
 The first (slower) method uses a vector of strings in data[, NETIDnode]
, where each
string i
must contain the space separated IDs or row numbers of all units in data
thought to be
connected to observation i (friends of unit i);
NETIDmat
 An alternative (and faster) method is to pass a matrix with Kmax
columns and nrow(data)
rows, where each row NETIDmat[i,]
is a vector of observation i
's friends' IDs or i
's friends'
row numbers in data
if IDnode=NULL
. If observation i
has fewer than Kmax
friends, the
remainder of NETIDmat[i,]
must be filled with NA
s. Note that the ordering of friend indices is
irrelevant.
IPTW estimator
**********************************************************************
As described in the following section, the first step is to construct an estimator P_{g_N}(sA  sW) for the common (in
i
) conditional density P_{g_0}(sA  sW) for common (ini
) unitlevel summary measures (sA
,sW
).The same fitting algorithm is applied to construct an estimator P_{g^*_N}(sA^*  sW^*) of the common (in
i
) conditional density P_{g^*}(sA^*  sW^*) for common (ini
) unitlevel summary measures (sA^*
,sW^*
) implied by the usersupplied stochastic interventionf_gstar1
orf_gstar2
and the observed distribution ofW
.These two density estimators form the basis of the IPTW estimator, which is evaluated at the observed N data points O_i=(sW_i, sA_i, Y_i), i=1,...,N and is given by
ψ^{IPTW}_n = ∑_{i=1,...,N}{Y_i \frac{P_{g^*_N}(sA^*=sA_i  sW=sW_i)}{P_{g_N}(sA=sA_i  sW=sW_i)}}.
GCOMP estimator
**********************************************************************
TMLE estimator
**********************************************************************
Modeling P(sAsW)
for summary measures (sA,sW)
**********************************************************************
Nonparametric
estimation of the common unitlevel multivariate joint conditional probability model P_g0(sAsW)
,
for unitlevel summary measures (sA,sW)
generated from the observed exposures and baseline covariates
(A,W)=(A_i,W_i : i=1,...,N) (their joint density given by g_0(AW)Q(W)), is performed by first
constructing the dataset of N summary measures, (sA_i,sW_i : i=1,...,N), and then fitting the usual i.i.d. MLE
for the common density P_g0(sAsW)
based on the pooled N sample of these summary measures.
Note that sA
can be multivariate and any of its components sA[j]
can be either binary, categorical
or continuous.
The joint probability model for P(sAsA)
= P(sA[1],...,sA[k]sA)
can be factorized as
P(sA[1]sA)
* P(sA[2]sA, sA[1])
* ... * P(sA[k]sA, sA[1],...,sA[k1])
,
where each of these conditional probability models is fit separately, depending on the type of the outcome variable
sA[j]
.
If sA[j]
is binary, the conditional probability P(sA[j]sW,sA[1],...,sA[j1])
is evaluated via logistic
regression model.
When sA[j]
is continuous (or categorical), its range will be fist partitioned into K
bins and the
corresponding K
bin indicators (B_1,...,B_K
), where each bin indicator B_j
is then used as an outcome in a
separate logistic regression model with predictors given by sW, sA[1],...,sA[k1]
.
Thus, the joint probability P(sAsW)
is defined by such a tree of binary logistic regressions.
For simplicity, we now suppose sA
is continuous and univariate and we describe here an algorithm for fitting
P_{g_0}(sA  sW) (the algorithm
for fitting P_{g^*}(sA^*  sW^*) is equivalent, except that exposure A
is replaced with exposure A^*
generated under f_gstar1
or f_gstar2
and
the predictors sW
from the regression formula hform.g0
are replaced with predictors sW^*
specified by the regression formula hform.gstar
).
Generate a dataset of N observed continuous summary measures (
sa_i
:i=1,...,N) from observed ((a_i
,w_i
):i=1,...,N). Letsa
\insa_i
:i=1,...,M.Divide the range of
sA
values into intervals S=(i_1,...,i_M,i_M+1) so that any observed data pointsa_i
belongs to one interval in S, namely, for each possible value sa ofsA
there is k\in1,...,M, such that, i_k <sa
<= i_k+1. Let the mapping B(sa)\in1,...,M denote a unique interval in S for sa, such that, i_B(sa) < sa <= i_B(sa)+1. Let bw_B(sa):=i_B(sa)+1i_B(sa) be the length of the interval (bandwidth) (i_B(sa),i_B(sa)+1). Also define the binary indicators b_1,...,b_M, where b_j:=I(B(sa)=j), for all j <= B(sa) and b_j:=NA for all j>B(sa). That is we set b_j to missing ones the indicator I(B(sa)=j) jumps from 0 to 1. Now letsA
denote the random variable for the observed summary measure for one unit and denote by (B_1,...,B_M) the corresponding random indicators forsA
defined as B_j := I(B(sA
) = j) for all j <= B(sA
) and B_j:=NA for all j>B(sA
).For each j=1,...,M, fit the logistic regression model for the conditional probability P(B_j = 1  B_j1=0, sW), i.e., at each j this is defined as the conditional probability of B_j jumping from 0 to 1 at bin j, given that B_j1=0 and each of these logistic regression models is fit only among the observations that are still at risk of having B_j=1 with B_j1=0.
Normalize the above conditional probability of B_j jumping from 0 to 1 by its corresponding interval length (bandwidth) bw_j to obtain the discrete conditional hazards h_j(sW):=P(B_j = 1  (B_j1=0, sW) / bw_j, for each j. For the summary measure
sA
, the above conditional hazard h_j(sW) is equal to P(sA
\in (i_j,i_j+1) sA
>=i_j, sW), i.e., this is the probability thatsA
falls in the interval (i_j,i_j+1), conditional on sW and conditional on the fact thatsA
does not belong to any intervals before j.Finally, for any given datapoint
(sa,sw)
, evaluate the discretized conditional density for P(sA
=sasW=sw) by first evaluating the interval number k=B(sa)\in1,...,M forsa
and then computing \prodj=1,...,k11h_j(sW))*h_k(sW) which is equivalent to the joint conditional probability thatsa
belongs to the interval (i_k,i_k+1) and does not belong to any of the intervals 1 to k1, conditional on sW.
The evaluation above utilizes a discretization of the fact that any continuous density f of random variable X can be written as f_X(x)=S_X(x)*h_X(x), for a continuous density f of X where S_X(x):=P(X>x) is the survival function for X, h_X=P(X>xX>=x) is the hazard function for X; as well as the fact that the discretized survival function S_X(x) can be written as a of the hazards for s<x: S_X(x)=\prods<xh_X(x).
Three methods for defining bin (interval) cuttoffs for a continuous onedimenstional summary measure sA[j]
**********************************************************************
There are 3 alternative methods to defining the bin cutoffs S=(i_1,...,i_M,i_M+1) for a continuous summary measure
sA
. The choice of which method is used along with other discretization parameters (e.g., total number of
bins) is controlled via the tmlenet_options() function. See ?tmlenet_options
argument bin.method
for
additional details.
Approach 1 (equal.len
): equal length, default.
*********************
The bins are defined by splitting the range of observed sA
(sa_1,...,sa_n) into equal length intervals.
This is the dafault discretization method, set by passing an argument bin.method="equal.len"
to
tmlenet_options
function prior to calling tmlenet()
. The intervals will be defined by splitting the
range of (sa_1,...,sa_N) into nbins
number of equal length intervals, where nbins
is another argument
of tmlenet_options()
function. When nbins=NA
(the default setting) the actual value of nbins
is computed at run time by taking the integer value (floor) of n/maxNperBin
,
for n
 the total observed sample size and maxNperBin=1000
 another argument of
tmlenet_options()
with the default value 1,000.
Approach 2 (equal.mass
): dataadaptive equal mass intervals.
*********************
The intervals are defined by splitting the range of sA
into nonequal length dataadaptive intervals that
ensures that each interval contains around
maxNperBin
observations from (sa_j:j=1,...,N).
This interval definition approach can be selected by passing an argument bin.method="equal.mass"
to
tmlenet_options()
prior to calling tmlenet()
.
The method ensures that an approximately equal number of observations will belong to each interval, where that number
of observations for each interval
is controlled by setting maxNperBin
. The default setting is maxNperBin=1000
observations per interval.
Approach 3 (dhist
): combination of 1 & 2.
*********************
The dataadaptive approach dhist is a mix of Approaches 1 & 2. See Denby and Mallows "Variations on the Histogram"
(2009)). This interval definition method is selected by passing an argument bin.method="dhist"
to
tmlenet_options()
prior to calling tmlenet()
.
See Also
tmlenetpackage
for the general overview of the package,
def.sW
for defining the summary measures, eval.summaries
for
evaluation and validation of the summary measures,
and df_netKmax2
/df_netKmax6
/NetInd_mat_Kmax6
for examples of network datasets.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216  #***************************************************************************************
data(df_netKmax6) # Load observed data
data(NetInd_mat_Kmax6) # Load the network ID matrix
Kmax < 6 # Max number of friends in the network
#***************************************************************************************
#***************************************************************************************
# EXAMPLES OF INTERVENTION FUNCTIONS:
#***************************************************************************************
# Returns a function that will sample A with probability x:=P(A=1))
make_f.gstar < function(x, ...) {
eval(x)
f.A_x < function(data, ...){
rbinom(n = nrow(data), size = 1, prob = x[1])
}
return(f.A_x)
}
# Deterministic f_gstar setting every A=0:
f.A_0 < make_f.gstar(x = 0)
# Deterministic f_gstar setting every A=1:
f.A_1 < make_f.gstar(x = 1)
# Stochastically sets (100*x)% of population to A=1 with probability 0.2:
f.A_.2 < make_f.gstar(x = 0.2)
#***************************************************************************************
# DEFINING SUMMARY MEASURES:
#***************************************************************************************
def_sW < def.sW(netW2 = W2[[1:Kmax]]) +
def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE)
def_sA < def.sA(sum.netAW2 = sum((1A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE) +
def.sA(netA = A[[0:Kmax]])
#***************************************************************************************
# EVALUATING SUMMARY MEASURES BASED ON INPUT DATA:
#***************************************************************************************
# A helper function that can preevaluate the summary measures on (O)bserved data,
# given one of two ways of specifying the network:
res < eval.summaries(sW = def_sW, sA = def_sA, Kmax = 6, data = df_netKmax6,
NETIDmat = NetInd_mat_Kmax6, verbose = TRUE)
#***************************************************************************************
# Specifying the clever covariate regressions hform.g0 and hform.gstar:
# Left side can consist of any summary names defined by def.sA (as linear terms)
# Right side can consist of any summary names defined by def.sW (as linear terms) & 'nF'
#***************************************************************************************
hform.g01 < "netA ~ netW2 + sum.netW3 + nF"
hform.gstar1 < "netA ~ sum.netW3"
# alternatives:
hform.g02 < "netA + sum.netAW2 ~ netW2 + sum.netW3 + nF"
hform.g03 < "sum.netAW2 ~ netW2 + sum.netW3"
#***************************************************************************************
# Specifying the outcome regression Qform:
# Left side is ignored (with a warning if not equal to Ynode)
# Right side can by any summary names defined by def.sW, def.sA & 'nF'
#***************************************************************************************
Qform1 < "Y ~ sum.netW3 + sum.netAW2"
Qform2 < "Y ~ netA + netW + nF"
Qform3 < "blah ~ netA + netW + nF"
#***************************************************************************************
# Estimate mean population outcome under deterministic intervention A=0 with 6 friends:
# Estimation with regression formulas.
#***************************************************************************************
# Note that Ynode is optional when Qform is specified;
options(tmlenet.verbose = FALSE) # set to TRUE to print status messages
res_K6_1 < tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", f_gstar1 = 0L,
Qform = "Y ~ sum.netW3 + sum.netAW2",
hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
hform.gstar = "netA ~ sum.netW3",
IDnode = "IDs", NETIDnode = "Net_str",
optPars = list(n_MCsims = 1))
res_K6_1$EY_gstar1$estimates
res_K6_1$EY_gstar1$vars
res_K6_1$EY_gstar1$CIs
#***************************************************************************************
# Run exactly the same estimators as above,
# but using the output "DatNet.ObsP0" of eval.summaries as input to tmlenet:
#***************************************************************************************
res_K6_1b < tmlenet(DatNet.ObsP0 = res$DatNet.ObsP0,
Anode = "A", f_gstar1 = 0L,
Qform = "Y ~ sum.netW3 + sum.netAW2",
hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
hform.gstar = "netA ~ sum.netW3",
optPars = list(n_MCsims = 1))
res_K6_1b$EY_gstar1$estimates
res_K6_1b$EY_gstar1$vars
res_K6_1b$EY_gstar1$CIs
#***************************************************************************************
# Same as above but for covariatebased TMLE.
#***************************************************************************************
res_K6_2 < tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", f_gstar1 = 0L,
Qform = "Y ~ sum.netW3 + sum.netAW2",
hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
hform.gstar = "netA ~ sum.netW3",
IDnode = "IDs", NETIDnode = "Net_str",
optPars = list(runTMLE = "tmle.covariate", n_MCsims = 1))
res_K6_2$EY_gstar1$estimates
res_K6_2$EY_gstar1$vars
res_K6_2$EY_gstar1$CIs
#***************************************************************************************
# SPECIFYING THE NETWORK AS A MATRIX OF FRIEND ROW NUMBERS:
#***************************************************************************************
net_ind_obj < simcausal::NetIndClass$new(nobs = nrow(df_netKmax6), Kmax = Kmax)
# generating the network matrix from input data:
NetInd_mat < net_ind_obj$makeNetInd.fromIDs(Net_str = df_netKmax6[, "Net_str"],
IDs_str = df_netKmax6[, "IDs"])$NetInd
nF < net_ind_obj$nF # number of friends
data(NetInd_mat_Kmax6)
all.equal(NetInd_mat, NetInd_mat_Kmax6) # TRUE
print(head(NetInd_mat))
print(head(nF))
print(all.equal(df_netKmax6[,"nFriends"], nF))
res_K6_net1 < tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", f_gstar1 = f.A_0,
Qform = "Y ~ sum.netW3 + sum.netAW2",
hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
hform.gstar = "netA ~ sum.netW3",
NETIDmat = NetInd_mat,
optPars = list(runTMLE = "tmle.intercept", n_MCsims = 1))
all.equal(res_K6_net1$EY_gstar1$estimates, res_K6_1$EY_gstar1$estimates)
all.equal(res_K6_net1$EY_gstar1$vars, res_K6_1$EY_gstar1$vars)
all.equal(res_K6_net1$EY_gstar1$CIs, res_K6_1$EY_gstar1$CIs)
#***************************************************************************************
# EQUIVALENT WAYS OF SPECIFYING INTERVENTIONS f_gstar1/f_gstar2.
# LOWERING THE DIMENSIONALITY OF THE SUMMARY MEASURES.
#***************************************************************************************
def_sW < def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE)
def_sA < def.sA(sum.netAW2 = sum((1A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE)
# can define intervention by function f.A_0 that sets everyone's A to constant 0:
res_K6_1 < tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y", f_gstar1 = f.A_0,
NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates
# equivalent way to define intervention f.A_0 is to just set f_gstar1 to 0:
res_K6_1 < tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y", f_gstar1 = 0L,
NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates
# or set f_gstar1 to a vector of 0's of length nrow(data):
res_K6_1 < tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y", f_gstar1 = rep_len(0L, nrow(df_netKmax6)),
NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates
#***************************************************************************************
# EXAMPLE WITH SIMULATED DATA FOR AT MOST 2 FRIENDS AND 1 BASELINE COVARIATE W1
#***************************************************************************************
data(df_netKmax2)
head(df_netKmax2)
Kmax < 2
# Define the summary measures:
def_sW < def.sW(W1[[0:Kmax]])
def_sA < def.sA(A[[0:Kmax]])
# Define the network matrix:
net_ind_obj < simcausal::NetIndClass$new(nobs = nrow(df_netKmax2), Kmax = Kmax)
NetInd_mat < net_ind_obj$makeNetInd.fromIDs(Net_str = df_netKmax2[, "Net_str"],
IDs_str = df_netKmax2[, "IDs"])$NetInd
#***************************************************************************************
# Mean population outcome under stochastic intervention P(A=1)=0.2
#***************************************************************************************
# Evaluates the target parameter by sampling exposures A from f_gstar1;
# Setting optPars$n_MCsims=100 means that A will be sampled 1000 times (
# for a total sample size of nrow(data)*n_MCsims under f_gstar1)
options(tmlenet.verbose = TRUE)
res_K2_1 < tmlenet(data = df_netKmax2, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y", f_gstar1 = f.A_.2,
NETIDmat = NetInd_mat, optPars = list(n_MCsims = 100))
res_K2_1$EY_gstar1$estimates
res_K2_1$EY_gstar1$vars
res_K2_1$EY_gstar1$CIs
#***************************************************************************************
# Estimating the average treatment effect (ATE) for two static interventions:
# f_gstar1 sets everyone A=1 vs f_gstar2 sets everyone A=0;
#***************************************************************************************
res_K2_2 < tmlenet(data = df_netKmax2, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y",
f_gstar1 = 1,
NETIDmat = NetInd_mat,
optPars = list(f_gstar2 = 0, n_MCsims = 1))
names(res_K2_2)
# Estimates under f_gstar1:
res_K2_2$EY_gstar1$estimates
res_K2_2$EY_gstar1$vars
res_K2_2$EY_gstar1$CIs
# Estimates under f_gstar2:
res_K2_2$EY_gstar2$estimates
res_K2_2$EY_gstar2$vars
res_K2_2$EY_gstar2$CIs
# ATE estimates for f_gstar1f_gstar2:
res_K2_2$ATE$estimates
res_K2_2$ATE$vars
res_K2_2$ATE$CIs
