Description Usage Arguments Value Details Additional parameters Specifying the counterfactual intervention function (f_gstar1 and optPars$f_gstar2) Specifying the Network of Friends IPTW estimator GCOMP estimator TMLE estimator Modeling P(sA|sW) for summary measures (sA,sW) Three methods for defining bin (interval) cuttoffs for a continuous one-dimenstional summary measure sA[j] See Also Examples
Estimate the average network effect among dependent units with known network structure (in presence of interference and/or spillover) using TMLE (targeted maximum likelihood estimation), IPTW (Horvitz-Thompson or the inverse-probability-of-treatment) and GCOMP (parametric G-computation formula).
1 2 3 4 5 6 7 | tmlenet(DatNet.ObsP0, data, Kmax, sW, sA, Anode, Ynode, f_gstar1,
Qform = NULL, hform.g0 = NULL, hform.gstar = NULL, NETIDmat = NULL,
IDnode = NULL, NETIDnode = NULL, verbose = getOption("tmlenet.verbose"),
optPars = list(alpha = 0.05, lbound = 0.005, family = "binomial", n_MCsims =
10, runTMLE = c("tmle.intercept", "tmle.covariate"), YnodeDET = NULL, f_gstar2
= NULL, sep = " ", f_g0 = NULL, h_g0_SummariesModel = NULL,
h_gstar_SummariesModel = NULL))
|
DatNet.ObsP0 |
Instance of class |
data |
Observed data, a |
Kmax |
Integer constant specifying the maximal number of friends for any observation in the input |
sW |
Summary measures constructed from baseline covariates alone. This must be an object of class
|
sA |
Summary measures constructed from exposures |
Anode |
Exposure (treatment) variable name (column name in |
Ynode |
Outcome variable name (column name in |
f_gstar1 |
Either a function or a vector of counterfactual exposures. If a function, must return
a vector of counterfactual exposures evaluated based on the summary measures matrix ( |
Qform |
Regression formula for outcome in |
hform.g0 |
Regression formula for estimating the conditional density of P( |
hform.gstar |
Regression formula for estimating the conditional density P( |
NETIDmat |
Network specification via matrix of friend IDs ( |
IDnode |
Subject identifier variable in the input data, if not supplied the network string in
|
NETIDnode |
Network specification by a column name in input |
verbose |
Set to |
optPars |
A named list of additional optional parameters to be passed to |
A named list with 3 items containing the estimation results for:
EY_gstar1
- estimates of the mean counterfactual outcome under (stochastic) intervention function f_gstar1
(E_{g^*_1}[Y]).
EY_gstar2
- estimates of the mean counterfactual outcome under (stochastic) intervention function f_gstar2
(E_{g^*_2}[Y]),
NULL
if f_gstar2
not specified.
ATE
- additive treatment effect (E_{g^*_1}[Y] - E_{g^*_2}[Y]) under interventions f_gstar1
vs. in f_gstar2
, NULL
if f_gstar2
not specified.
Each list item above is itself a list containing the items:
estimates
- various estimates of the target parameter (network population counterfactual mean under
(stochastic) intervention).
vars
- the asymptotic variance estimates, for IPTW and TMLE.
CIs
- CI estimates at alpha
level, for IPTW and TMLE.
other.vars
- Placeholder for future versions.
h_g0_SummariesModel
- The model fits for P(sA
|sW
) under observed exposure mechanism
g0
. This is an object of SummariesModel
R6 class.
h_gstar_SummariesModel
- The model fits for P(sA
|sW
) under intervention f_gstar1
or f_gstar2
. This is an object of SummariesModel
R6 class.
Currently implemented estimators are:
tmle
- Either weighted regression intercept-based TMLE (tmle.intercept
- the default)
with weights defined by the IPTW weights h_gstar/h_gN
or
covariate-based unweighted TMLE (tmle.covariate
) that uses the IPTW weights as a covariate
h_gstar/h_gN
.
h_iptw
- Efficient IPTW based on weights h_gstar/h_gN.
gcomp
- Parametric G-computation formula substitution estimator.
Note that in case when both arguments NETIDnode
and NETIDmat
are left unspecified the input data are
assumed independent, i.e., no network dependency between the observations.
All inference will be performed based on the i.i.d. efficient influence curve for the target
parameter (E_{g^*_1}[Y]).
Also note that the ordering of the friends IDs in NETIDnode
or NETIDmat
is unimportant.
A special non-negative-integer-valued variable nF
is automatically calculated each time
tmlenet
or eval.summaries
functions are called. nF
contains the total number of friends for
each observation and it is always added as an additional column to the matrix of the baseline-covariates-based
summary measures sWmat
. The variable nF
can be used in the same ways as any of the column names in
the input data frame data
. In particular, the name nF
can be used inside the summary measure
expressions (calls to functions def.sW
and def.sA
) and inside any of the regression formulas
(Qform
, hform.g0
, hform.gstar
).
The regression formalas in Qform
, hform.g0
and hform.gstar
can include any summary measures names defined in
sW
and sA
, referenced by their individual variable names or by their aggregate summary measure names.
For example, hform.g0 = "netA ~ netW"
is equivalent to
hform.g0 = "A + A_netF1 + A_netF2 ~ W + W_netF1 + W_netF2"
for sW,sA
summary measures defined by
def.sW(netW=W[[0:2]])
and def.sA(netA=A[[0:2]])
.
Some of the parameters that control the estimation in tmlenet
can be set by calling the function
tmlenet_options
.
Additional parameters can be also specified as a named list optPars
argument of the
tmlenet
function. The items that can be specified in optPars
are:
alpha
- alpha-level for CI calculation (0.05 for 95
lbound
- One value for symmetrical bounds on P(sW | sW).
family
- Family specification for regression models, defaults to binomial (CURRENTLY ONLY BINOMIAL
FAMILY IS IMPLEMENTED).
n_MCsims
- Number of Monte-Carlo simulations performed, each of sample size nrow(data)
,
for generating new exposures under f_gstar1
or f_gstar2
(if specified) or f_g0
(if specified).
These newly generated exposures are utilized when fitting the conditional densities P(sA
|sW
)
and when evaluating the substitution estimators GCOMP and TMLE
under stochastic interventions f_gstar1
or f_gstar2
.
runTMLE
- Choose which of the two TMLEs to run, "tmle.intercept" or "tmle.covariate". The default is "tmle.intercept".
YnodeDET
- Optional column name for indicators of deterministic values of outcome Ynode
,
coded as (TRUE
/FALSE
) or
(1
/0
); observations with YnodeDET
=TRUE
/1
are assumed to have constant value for their Ynode
.
f_gstar2
- Either a function or a vector of counterfactual exposure assignments.
Used for estimating contrasts (average treatment effect) for two interventions, if omitted, only the average
counterfactual outcome under intervention f_gstar1
is estimated. The requirements for f_gstar2
are identical to those for f_gstar1
.
sep
- A character separating friend indices for the same observation in NETIDnode
.
f_g0
- A function for generating true treatment mechanism under observed Anode
, if known (for example in a
randomized trial). This is used for estimating P(sA
|sW
) under g0
by sampling large vector of Anode
(of length nrow(data)*n_MCsims
) from f_g0
function;
h_g0_SummariesModel
- Previously fitted model for P(sA
|sW
) under observed exposure mechanism g0
,
returned by the previous runs of the tmlenet
function.
This has to be an object of SummariesModel
R6 class. When this argument is specified, all predictions
P(sA
=sa
|sW
=sw
) under g0
will be based on the model fits provided by this argument.
h_gstar_SummariesModel
- Previously fitted model for P(sA
|sW
) under (stochastic) intervention
specified by f_gstar1
or f_gstar2
. Also an object of SummariesModel
R6 class.
When this argument is specified, the predictions P(sA
=sa
|sW
=sw
)
under f_gstar1
or f_gstar2
will be based on the model fits provided by this argument.
f_gstar1
and optPars$f_gstar2
)The functions f_gstar1
and f_gstar2
can only depend on variables specified by the combined matrix
of summary measures (sW
,sA
), which is passed using the argument data
. The functions should
return a vector of length nrow(data)
of counterfactual treatments for observations in the input data.
The network of friends (connections) for observations in the input data
can be specified in two
alternative ways, using either NETIDnode
or NETIDmat
input arguments.
NETIDnode
- The first (slower) method uses a vector of strings in data[, NETIDnode]
, where each
string i
must contain the space separated IDs or row numbers of all units in data
thought to be
connected to observation i (friends of unit i);
NETIDmat
- An alternative (and faster) method is to pass a matrix with Kmax
columns and nrow(data)
rows, where each row NETIDmat[i,]
is a vector of observation i
's friends' IDs or i
's friends'
row numbers in data
if IDnode=NULL
. If observation i
has fewer than Kmax
friends, the
remainder of NETIDmat[i,]
must be filled with NA
s. Note that the ordering of friend indices is
irrelevant.
**********************************************************************
As described in the following section, the first step is to construct an estimator P_{g_N}(sA | sW)
for the common (in i
) conditional density P_{g_0}(sA | sW) for common (in i
) unit-level summary
measures (sA
,sW
).
The same fitting algorithm is applied to construct an estimator P_{g^*_N}(sA^* | sW^*) of the common (in i
)
conditional density P_{g^*}(sA^* | sW^*) for common (in i
) unit-level summary measures (sA^*
,sW^*
)
implied by the user-supplied stochastic intervention f_gstar1
or f_gstar2
and the observed distribution of W
.
These two density estimators form the basis of the IPTW estimator, which is evaluated at the observed N data points O_i=(sW_i, sA_i, Y_i), i=1,...,N and is given by
ψ^{IPTW}_n = ∑_{i=1,...,N}{Y_i \frac{P_{g^*_N}(sA^*=sA_i | sW=sW_i)}{P_{g_N}(sA=sA_i | sW=sW_i)}}.
**********************************************************************
**********************************************************************
P(sA|sW)
for summary measures (sA,sW)
**********************************************************************
Non-parametric
estimation of the common unit-level multivariate joint conditional probability model P_g0(sA|sW)
,
for unit-level summary measures (sA,sW)
generated from the observed exposures and baseline covariates
(A,W)=(A_i,W_i : i=1,...,N) (their joint density given by g_0(A|W)Q(W)), is performed by first
constructing the dataset of N summary measures, (sA_i,sW_i : i=1,...,N), and then fitting the usual i.i.d. MLE
for the common density P_g0(sA|sW)
based on the pooled N sample of these summary measures.
Note that sA
can be multivariate and any of its components sA[j]
can be either binary, categorical
or continuous.
The joint probability model for P(sA|sA)
= P(sA[1],...,sA[k]|sA)
can be factorized as
P(sA[1]|sA)
* P(sA[2]|sA, sA[1])
* ... * P(sA[k]|sA, sA[1],...,sA[k-1])
,
where each of these conditional probability models is fit separately, depending on the type of the outcome variable
sA[j]
.
If sA[j]
is binary, the conditional probability P(sA[j]|sW,sA[1],...,sA[j-1])
is evaluated via logistic
regression model.
When sA[j]
is continuous (or categorical), its range will be fist partitioned into K
bins and the
corresponding K
bin indicators (B_1,...,B_K
), where each bin indicator B_j
is then used as an outcome in a
separate logistic regression model with predictors given by sW, sA[1],...,sA[k-1]
.
Thus, the joint probability P(sA|sW)
is defined by such a tree of binary logistic regressions.
For simplicity, we now suppose sA
is continuous and univariate and we describe here an algorithm for fitting
P_{g_0}(sA | sW) (the algorithm
for fitting P_{g^*}(sA^* | sW^*) is equivalent, except that exposure A
is replaced with exposure A^*
generated under f_gstar1
or f_gstar2
and
the predictors sW
from the regression formula hform.g0
are replaced with predictors sW^*
specified by the regression formula hform.gstar
).
Generate a dataset of N observed continuous summary measures (sa_i
:i=1,...,N) from observed
((a_i
,w_i
):i=1,...,N). Let sa
\insa_i
:i=1,...,M.
Divide the range of sA
values into intervals S=(i_1,...,i_M,i_M+1) so that any observed data point
sa_i
belongs to one interval in S, namely,
for each possible value sa of sA
there is k\in1,...,M, such that, i_k < sa
<= i_k+1.
Let the mapping B(sa)\in1,...,M denote a unique interval in S for sa, such that, i_B(sa) < sa <= i_B(sa)+1.
Let bw_B(sa):=i_B(sa)+1-i_B(sa) be the length of the interval (bandwidth) (i_B(sa),i_B(sa)+1).
Also define the binary indicators b_1,...,b_M, where b_j:=I(B(sa)=j), for all j <= B(sa) and b_j:=NA for all j>B(sa).
That is we set b_j to missing ones the indicator I(B(sa)=j) jumps from 0 to 1.
Now let sA
denote the random variable for the observed summary measure for one unit
and denote by (B_1,...,B_M) the corresponding random indicators for sA
defined as B_j := I(B(sA
) = j)
for all j <= B(sA
) and B_j:=NA for all j>B(sA
).
For each j=1,...,M, fit the logistic regression model for the conditional probability P(B_j = 1 | B_j-1=0, sW), i.e., at each j this is defined as the conditional probability of B_j jumping from 0 to 1 at bin j, given that B_j-1=0 and each of these logistic regression models is fit only among the observations that are still at risk of having B_j=1 with B_j-1=0.
Normalize the above conditional probability of B_j jumping from 0 to 1 by its corresponding interval length (bandwidth) bw_j to
obtain the discrete conditional hazards h_j(sW):=P(B_j = 1 | (B_j-1=0, sW) / bw_j, for each j.
For the summary measure sA
, the above conditional hazard h_j(sW) is equal to P(sA
\in (i_j,i_j+1) | sA
>=i_j, sW),
i.e., this is the probability that sA
falls in the interval (i_j,i_j+1), conditional on sW and conditional on the fact that
sA
does not belong to any intervals before j.
Finally, for any given data-point (sa,sw)
, evaluate the discretized conditional density for P(sA
=sa|sW=sw) by first
evaluating the interval number k=B(sa)\in1,...,M for sa
and then computing \prodj=1,...,k-11-h_j(sW))*h_k(sW)
which is equivalent to the joint conditional probability that sa
belongs to the interval (i_k,i_k+1) and does not belong
to any of the intervals 1 to k-1, conditional on sW.
The evaluation above utilizes a discretization of the fact that any continuous density f of random variable X can be written as f_X(x)=S_X(x)*h_X(x), for a continuous density f of X where S_X(x):=P(X>x) is the survival function for X, h_X=P(X>x|X>=x) is the hazard function for X; as well as the fact that the discretized survival function S_X(x) can be written as a of the hazards for s<x: S_X(x)=\prods<xh_X(x).
sA[j]
**********************************************************************
There are 3 alternative methods to defining the bin cutoffs S=(i_1,...,i_M,i_M+1) for a continuous summary measure
sA
. The choice of which method is used along with other discretization parameters (e.g., total number of
bins) is controlled via the tmlenet_options() function. See ?tmlenet_options
argument bin.method
for
additional details.
Approach 1 (equal.len
): equal length, default.
*********************
The bins are defined by splitting the range of observed sA
(sa_1,...,sa_n) into equal length intervals.
This is the dafault discretization method, set by passing an argument bin.method="equal.len"
to
tmlenet_options
function prior to calling tmlenet()
. The intervals will be defined by splitting the
range of (sa_1,...,sa_N) into nbins
number of equal length intervals, where nbins
is another argument
of tmlenet_options()
function. When nbins=NA
(the default setting) the actual value of nbins
is computed at run time by taking the integer value (floor) of n/maxNperBin
,
for n
- the total observed sample size and maxNperBin=1000
- another argument of
tmlenet_options()
with the default value 1,000.
Approach 2 (equal.mass
): data-adaptive equal mass intervals.
*********************
The intervals are defined by splitting the range of sA
into non-equal length data-adaptive intervals that
ensures that each interval contains around
maxNperBin
observations from (sa_j:j=1,...,N).
This interval definition approach can be selected by passing an argument bin.method="equal.mass"
to
tmlenet_options()
prior to calling tmlenet()
.
The method ensures that an approximately equal number of observations will belong to each interval, where that number
of observations for each interval
is controlled by setting maxNperBin
. The default setting is maxNperBin=1000
observations per interval.
Approach 3 (dhist
): combination of 1 & 2.
*********************
The data-adaptive approach dhist is a mix of Approaches 1 & 2. See Denby and Mallows "Variations on the Histogram"
(2009)). This interval definition method is selected by passing an argument bin.method="dhist"
to
tmlenet_options()
prior to calling tmlenet()
.
tmlenet-package
for the general overview of the package,
def.sW
for defining the summary measures, eval.summaries
for
evaluation and validation of the summary measures,
and df_netKmax2
/df_netKmax6
/NetInd_mat_Kmax6
for examples of network datasets.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | #***************************************************************************************
data(df_netKmax6) # Load observed data
data(NetInd_mat_Kmax6) # Load the network ID matrix
Kmax <- 6 # Max number of friends in the network
#***************************************************************************************
#***************************************************************************************
# EXAMPLES OF INTERVENTION FUNCTIONS:
#***************************************************************************************
# Returns a function that will sample A with probability x:=P(A=1))
make_f.gstar <- function(x, ...) {
eval(x)
f.A_x <- function(data, ...){
rbinom(n = nrow(data), size = 1, prob = x[1])
}
return(f.A_x)
}
# Deterministic f_gstar setting every A=0:
f.A_0 <- make_f.gstar(x = 0)
# Deterministic f_gstar setting every A=1:
f.A_1 <- make_f.gstar(x = 1)
# Stochastically sets (100*x)% of population to A=1 with probability 0.2:
f.A_.2 <- make_f.gstar(x = 0.2)
#***************************************************************************************
# DEFINING SUMMARY MEASURES:
#***************************************************************************************
def_sW <- def.sW(netW2 = W2[[1:Kmax]]) +
def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE)
def_sA <- def.sA(sum.netAW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE) +
def.sA(netA = A[[0:Kmax]])
#***************************************************************************************
# EVALUATING SUMMARY MEASURES BASED ON INPUT DATA:
#***************************************************************************************
# A helper function that can pre-evaluate the summary measures on (O)bserved data,
# given one of two ways of specifying the network:
res <- eval.summaries(sW = def_sW, sA = def_sA, Kmax = 6, data = df_netKmax6,
NETIDmat = NetInd_mat_Kmax6, verbose = TRUE)
#***************************************************************************************
# Specifying the clever covariate regressions hform.g0 and hform.gstar:
# Left side can consist of any summary names defined by def.sA (as linear terms)
# Right side can consist of any summary names defined by def.sW (as linear terms) & 'nF'
#***************************************************************************************
hform.g01 <- "netA ~ netW2 + sum.netW3 + nF"
hform.gstar1 <- "netA ~ sum.netW3"
# alternatives:
hform.g02 <- "netA + sum.netAW2 ~ netW2 + sum.netW3 + nF"
hform.g03 <- "sum.netAW2 ~ netW2 + sum.netW3"
#***************************************************************************************
# Specifying the outcome regression Qform:
# Left side is ignored (with a warning if not equal to Ynode)
# Right side can by any summary names defined by def.sW, def.sA & 'nF'
#***************************************************************************************
Qform1 <- "Y ~ sum.netW3 + sum.netAW2"
Qform2 <- "Y ~ netA + netW + nF"
Qform3 <- "blah ~ netA + netW + nF"
#***************************************************************************************
# Estimate mean population outcome under deterministic intervention A=0 with 6 friends:
# Estimation with regression formulas.
#***************************************************************************************
# Note that Ynode is optional when Qform is specified;
options(tmlenet.verbose = FALSE) # set to TRUE to print status messages
res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", f_gstar1 = 0L,
Qform = "Y ~ sum.netW3 + sum.netAW2",
hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
hform.gstar = "netA ~ sum.netW3",
IDnode = "IDs", NETIDnode = "Net_str",
optPars = list(n_MCsims = 1))
res_K6_1$EY_gstar1$estimates
res_K6_1$EY_gstar1$vars
res_K6_1$EY_gstar1$CIs
#***************************************************************************************
# Run exactly the same estimators as above,
# but using the output "DatNet.ObsP0" of eval.summaries as input to tmlenet:
#***************************************************************************************
res_K6_1b <- tmlenet(DatNet.ObsP0 = res$DatNet.ObsP0,
Anode = "A", f_gstar1 = 0L,
Qform = "Y ~ sum.netW3 + sum.netAW2",
hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
hform.gstar = "netA ~ sum.netW3",
optPars = list(n_MCsims = 1))
res_K6_1b$EY_gstar1$estimates
res_K6_1b$EY_gstar1$vars
res_K6_1b$EY_gstar1$CIs
#***************************************************************************************
# Same as above but for covariate-based TMLE.
#***************************************************************************************
res_K6_2 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", f_gstar1 = 0L,
Qform = "Y ~ sum.netW3 + sum.netAW2",
hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
hform.gstar = "netA ~ sum.netW3",
IDnode = "IDs", NETIDnode = "Net_str",
optPars = list(runTMLE = "tmle.covariate", n_MCsims = 1))
res_K6_2$EY_gstar1$estimates
res_K6_2$EY_gstar1$vars
res_K6_2$EY_gstar1$CIs
#***************************************************************************************
# SPECIFYING THE NETWORK AS A MATRIX OF FRIEND ROW NUMBERS:
#***************************************************************************************
net_ind_obj <- simcausal::NetIndClass$new(nobs = nrow(df_netKmax6), Kmax = Kmax)
# generating the network matrix from input data:
NetInd_mat <- net_ind_obj$makeNetInd.fromIDs(Net_str = df_netKmax6[, "Net_str"],
IDs_str = df_netKmax6[, "IDs"])$NetInd
nF <- net_ind_obj$nF # number of friends
data(NetInd_mat_Kmax6)
all.equal(NetInd_mat, NetInd_mat_Kmax6) # TRUE
print(head(NetInd_mat))
print(head(nF))
print(all.equal(df_netKmax6[,"nFriends"], nF))
res_K6_net1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", f_gstar1 = f.A_0,
Qform = "Y ~ sum.netW3 + sum.netAW2",
hform.g0 = "netA ~ netW2 + sum.netW3 + nF",
hform.gstar = "netA ~ sum.netW3",
NETIDmat = NetInd_mat,
optPars = list(runTMLE = "tmle.intercept", n_MCsims = 1))
all.equal(res_K6_net1$EY_gstar1$estimates, res_K6_1$EY_gstar1$estimates)
all.equal(res_K6_net1$EY_gstar1$vars, res_K6_1$EY_gstar1$vars)
all.equal(res_K6_net1$EY_gstar1$CIs, res_K6_1$EY_gstar1$CIs)
#***************************************************************************************
# EQUIVALENT WAYS OF SPECIFYING INTERVENTIONS f_gstar1/f_gstar2.
# LOWERING THE DIMENSIONALITY OF THE SUMMARY MEASURES.
#***************************************************************************************
def_sW <- def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE)
def_sA <- def.sA(sum.netAW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE)
# can define intervention by function f.A_0 that sets everyone's A to constant 0:
res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y", f_gstar1 = f.A_0,
NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates
# equivalent way to define intervention f.A_0 is to just set f_gstar1 to 0:
res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y", f_gstar1 = 0L,
NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates
# or set f_gstar1 to a vector of 0's of length nrow(data):
res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y", f_gstar1 = rep_len(0L, nrow(df_netKmax6)),
NETIDmat = NetInd_mat)
res_K6_1$EY_gstar1$estimates
#***************************************************************************************
# EXAMPLE WITH SIMULATED DATA FOR AT MOST 2 FRIENDS AND 1 BASELINE COVARIATE W1
#***************************************************************************************
data(df_netKmax2)
head(df_netKmax2)
Kmax <- 2
# Define the summary measures:
def_sW <- def.sW(W1[[0:Kmax]])
def_sA <- def.sA(A[[0:Kmax]])
# Define the network matrix:
net_ind_obj <- simcausal::NetIndClass$new(nobs = nrow(df_netKmax2), Kmax = Kmax)
NetInd_mat <- net_ind_obj$makeNetInd.fromIDs(Net_str = df_netKmax2[, "Net_str"],
IDs_str = df_netKmax2[, "IDs"])$NetInd
#***************************************************************************************
# Mean population outcome under stochastic intervention P(A=1)=0.2
#***************************************************************************************
# Evaluates the target parameter by sampling exposures A from f_gstar1;
# Setting optPars$n_MCsims=100 means that A will be sampled 1000 times (
# for a total sample size of nrow(data)*n_MCsims under f_gstar1)
options(tmlenet.verbose = TRUE)
res_K2_1 <- tmlenet(data = df_netKmax2, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y", f_gstar1 = f.A_.2,
NETIDmat = NetInd_mat, optPars = list(n_MCsims = 100))
res_K2_1$EY_gstar1$estimates
res_K2_1$EY_gstar1$vars
res_K2_1$EY_gstar1$CIs
#***************************************************************************************
# Estimating the average treatment effect (ATE) for two static interventions:
# f_gstar1 sets everyone A=1 vs f_gstar2 sets everyone A=0;
#***************************************************************************************
res_K2_2 <- tmlenet(data = df_netKmax2, Kmax = Kmax, sW = def_sW, sA = def_sA,
Anode = "A", Ynode = "Y",
f_gstar1 = 1,
NETIDmat = NetInd_mat,
optPars = list(f_gstar2 = 0, n_MCsims = 1))
names(res_K2_2)
# Estimates under f_gstar1:
res_K2_2$EY_gstar1$estimates
res_K2_2$EY_gstar1$vars
res_K2_2$EY_gstar1$CIs
# Estimates under f_gstar2:
res_K2_2$EY_gstar2$estimates
res_K2_2$EY_gstar2$vars
res_K2_2$EY_gstar2$CIs
# ATE estimates for f_gstar1-f_gstar2:
res_K2_2$ATE$estimates
res_K2_2$ATE$vars
res_K2_2$ATE$CIs
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.