def.sW: Define Summary Measures sA and sW
In tmlenet: Targeted Maximum Likelihood Estimation for Network Data

Description Usage Arguments Value Details Naming conventions See Also Examples

Define and store summary measures sW and sA that can be later processed inside eval.summaries or tmlenet functions. def.sW and def.sA return an R6 object of class DefineSummariesClass which stores the user-defined summary measure functions of the baseline covariates W and exposure A, which can be later evaluated inside the environment of the input data data frame. Note that calls to def.sW must be used for defining the summary measures that are functions of only the baseline covariates W, while calls to def.sA must be used for defining the summary measures that are functions of both, the baseline covariates W and exposure A. Each summary measure is specified as an evaluable R expression or a string that can be parsed into an evaluable R expression. Any variable name that exists as a named column in the input data data frame can be used as part of these expressions. Separate calls to def.sW/def.sA functions can be aggregated into a single collection with '+' function, e.g., def.sW(W1)+def.sW(W2). A special syntax is allowed inside these summary expressions:

'Var[[index]]' - will index the friend covariate values of the variable Var, e.g., 'Var[[1]]' will pull the covariate value of Var for the first friend, 'Var[[Kmax]]' of the last friend, and 'Var[[0]]' is equivalent to writing 'Var' itself (indexes itself).

A special argument named replaceNAw0 can be also passed to the def.sW, def.sA functions:

replaceNAw0 = TRUE - automatically replaces all the missing network covariate values (NA) with 0.

One can then test the evaluation of these summary measures by either passing the returned DefineSummariesClass object to function eval.summaries or by calling the internal method eval.nodeforms(data.df, netind_cl) on the result returned by def.sW or def.sA. Each separate argument to def.sW or def.sA represents a new summary measure. The user-specified argument name defines the name of the corresponding summary measure (where the summary measure represents the result of the evaluation of the corresponding R expression specified by the argument). When a particular argument is unnamed, the summary measure name will be generated automatically (see Details, Naming Conventions and Examples below).

def.sW(...)

def.sA(...)

## S3 method for class 'DefineSummariesClass'
sVar1 + sVar2

`...`	Named R expressions or character strings that specify the formula for creating the summary measures.
`sVar1`	An object returned by a call to `def.sW` or `def.sA` functions.
`sVar2`	An object returned by a call to `def.sW` or `def.sA` functions.

R6 object of class DefineSummariesClass which can be passed as an argument to eval.summaries and tmlenet functions.

The R expressions passed to these functions are evaluated later inside tmlenet or eval.summaries functions, using the environment of the input data frame, which is enclosed within the user-calling environment.

Note that when observation i has only j-1 friends, the i's value of "W_netFj" is automatically set to NA. This can be an undersirable behavior in some circumstances, in which case one can automatically replace all such NA's with 0's by setting the argument replaceMisVal0 = TRUE when calling functions def.sW or def.sA, i.e., def.sW(W[[1]], replaceMisVal0 = TRUE).

Naming conventions for summary measures with no user-supplied name (e.g., def.sW(W1)).

....................................

If only one unique variable name is used in the summary expression (only one parent), use the variable name itself to name the summary measure;
If there is more than 1 unique variable name (e.g., "W1+W2") in the summary expression, throw an exception (user must always supply summary measure names for such expressions).

Naming conventions for the evaluation results of summary measures defined by def.sW & def.sA.

....................................

When summary expression evaluates to a vector result, the vector is first converted to a 1 col matrix, with column name set equal to the summary expression name;
When the summary measure evaluates to a matrix result and the expression has only one unique variable name (one parent), the matrix column names are generated as follows: for the expressions such as "Var" or "Var[[0]]", the column names "Var" are assigned and for the expressions such as "Var[[j]]", the column names "Var_netFj" are assigned.
When the summary measure (e.g., named "SummName") evaluates to a matrix and either: 1) there is more than one unique variable name used inside the expression (e.g., "A + 2*W"), or 2) the resulting matrix has empty ("") column names, the column names are assigned according to the convention: "SummName.1", ..., "SummName.ncol", where "SummName" is replaced by the actual summary measure name and ncol is the number of columns in the resulting matrix.

eval.summaries for evaluation and validation of the summary measures, tmlenet for estimation, DefineSummariesClass for details on how the summary measures are stored and evaluated.

#***************************************************************************************
# LOAD DATA, LOAD A NETWORK
#***************************************************************************************
data(df_netKmax6) # load observed data
head(df_netKmax6)
data(NetInd_mat_Kmax6)  # load the network ID matrix
netind_cl <- simcausal:::NetIndClass$new(nobs = nrow(df_netKmax6), Kmax = 6)
netind_cl$NetInd <- NetInd_mat_Kmax6
head(netind_cl$nF)

#***************************************************************************************
# Example. Equivalent ways of defining the same summary measures.
# Note that 'nF' summary measure is always added to def.sW summary measures.
# Same rules apply to def.sA function, except that 'nF' is not added.
#***************************************************************************************
def_sW <- def.sW(W1, W2, W3)
def_sW <- def.sW("W1", "W2", "W3")
def_sW <- def.sW(W1 = W1, W2 = W2, W3 = W3)
def_sW <- def.sW(W1 = W1[[0]], W2 = W2[[0]], W3 = W3[[0]]) # W1[[0]] just means W1
def_sW <- def.sW(W1 = "W1[[0]]", W2 = "W2[[0]]", W3 = "W3[[0]]")

# evaluate the sW summary measures defined last:
resmatW <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)
head(resmatW)

# define sA summary measures and evaluate:
def_sA <- def.sA(A, AW1 =A*W1)
resmatA <- def_sA$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)
head(resmatA)

#***************************************************************************************
# Summary measures based on network (friend) values of the variable (matrix result).
#***************************************************************************************
# W2[[1:Kmax]] means vectors of W2 values of friends (W2_netF_j), j=1, ..., Kmax:
def_sW <- def.sW(netW2 = W2[[0:Kmax]], W3 = W3[[0]])
# evaluation result is a matrix:
resmat <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)
# The mapping from the summary measure names to actual evaluation column names:
def_sW$sVar.names.map

# Equivalent way to define the same summary measure is to use syntax '+'
# and omit the names of the two summary measures above
# (the names are assigned automatically as "W2" for the first matrix W2[[0:Kmax]]
# and "W3" for the second summary measure "W3[[0]]")
def_sW <- def.sW(W2[[0:Kmax]]) + def.sW(W3[[0]])
resmat2 <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)
head(resmat2)
# The mapping from the summary measure names to actual evaluation column names:
def_sW$sVar.names.map

#***************************************************************************************
# Define new summary measure as a sum of friend covariate values of W3:
#***************************************************************************************
# replaceNAw0 = TRUE sets all the missing values to 0
def_sW <- def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0 = TRUE)

# evaluation result:
resmat <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)

#***************************************************************************************
# More complex summary measures that involve more than one covariate:
#***************************************************************************************
# replaceNAw0 = TRUE sets all the missing values to 0
def_sW <- def.sW(netW1W3 = W3[[1:Kmax]]*W3[[1:Kmax]])

# evaluation result (matrix):
resmat <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)
# the mapping from the summary measure names to the matrix column names:
def_sW$sVar.names.map

#***************************************************************************************
# Vector results, complex summary measure (more than one unique variable name):
# NOTE: all complex summary measures must be named, otherwise an error is produced
#***************************************************************************************
# named expression:
def_sW <- def.sW(sum.netW2W3 = sum(W3[[1:Kmax]]*W2[[1:Kmax]]), replaceNAw0 = TRUE)
mat1a <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)

# the same unnamed expression (trying to run will result in error):
def_sW <- def.sW(sum(W3[[1:Kmax]]*W2[[1:Kmax]]), replaceNAw0 = TRUE)
## Not run: 
  mat1b <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)  

## End(Not run)

#***************************************************************************************
# Matrix result, complex summary measure (more than one unique variable name):
# NOTE: all complex summary measures must be named, otherwise an error is produced
#***************************************************************************************
# When more than one parent is present, the columns are named by convention:
# sVar.name%+%c(1:ncol)

# named expression:
def_sW <- def.sW(sum.netW2W3 = W3[[1:Kmax]]*W2[[1:Kmax]])
mat1a <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)

# the same unnamed expression (trying to run will result in error):
def_sW <- def.sW(W3[[1:Kmax]]*W2[[1:Kmax]])
## Not run: 
  mat1b <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)

## End(Not run)

#***************************************************************************************
# Iteratively building higher dimensional summary measures using '+' function:
#***************************************************************************************
def_sW <- def.sW(W1) +
          def.sW(netW1 = W2[[1:Kmax]]) +
          def.sW(sum.netW1W3 = sum(W1[[1:Kmax]]*W3[[1:Kmax]]), replaceNAw0 = TRUE)

# resulting matrix of summary measures:
resmat <- def_sW$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)
# the mapping from the summary measure names to the matrix column names:
def_sW$sVar.names.map

#***************************************************************************************
# Examples of summary measures defined by def.sA (functions of baseline and treatment)
#***************************************************************************************
def_sA <- def.sA(sum.netAW2net = sum((1-A[[1:Kmax]]) * W2[[1:Kmax]]),
                  replaceNAw0 = TRUE) +
          def.sA(netA = A[[0:Kmax]])

resmat <- def_sA$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)
def_sW$sVar.names.map

#***************************************************************************************
# More summary measures for sA
#***************************************************************************************
def_sA <- def.sA(netA = "A[[0:Kmax]]") +
          def.sA(sum.AW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0 = TRUE)

resmat <- def_sA$eval.nodeforms(data.df = df_netKmax6, netind_cl = netind_cl)
def_sW$sVar.names.map

#***************************************************************************************
# Using eval.summaries to evaluate summary measures for both, def.sW and def.sA
# based on the (O)bserved data (data.frame) and network
#***************************************************************************************
def_sW <- def.sW(netW2 = W2[[1:Kmax]]) +
          def.sW(netW3_sum = sum(W3[[1:Kmax]]), replaceNAw0 = TRUE)
            
def_sA <- def.sA(sum.AW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0 = TRUE) +
          def.sA(netA = A[[0:Kmax]])

data(df_netKmax6) # load observed data
data(NetInd_mat_Kmax6)  # load the network ID matrix
res <- eval.summaries(sW = def_sW, sA = def_sA, Kmax = 6, data = df_netKmax6,
                      NETIDmat = NetInd_mat_Kmax6, verbose = TRUE)

# Contents of the list returned by eval.summaries():
names(res)
# matrix of sW summary measures:
head(res$sW.matrix)
# matrix of sA summary measures:
head(res$sA.matrix)
# matrix of network IDs:
head(res$NETIDmat)
# Observed data (sW,sA) stored as "DatNet.sWsA" R6 class object:
res$DatNet.ObsP0
class(res$DatNet.ObsP0)