DatKeepClass: R6 class for Storing, Managing, Subsetting and Manipulating...

Description Usage Format Details Methods Active Bindings See Also Examples

Description

DatKeepClass allows user to access the input data. The processed covariates from sVar.object are stored as a matrix in (private$.mat.sVar). This class could subset, combine, normalize, discretize and binarize covariates in (A, W, E). For disretization of continous and categorical variables, it can automatically detect / set covariates type (binary, categor, contin), detect / set bin intervals, and construct bin indicators. Besides, it provides methods for generating new exposures under user-specific arbitrary intervention g^{*} through self$make.dat.sVar, and allows user to replace missing values with user-specific gvars$misXreplace (Default to 0). Its pointers will be passed on to GenericModel functions: using in $fit(), $predict() and $predictAeqa().

Usage

1

Format

An R6Class generator object

Details

Methods

new(Odata, nodes, YnodeVals, det.Y, norm.c.sVars = FALSE, ..)

Instantiate an new instance of DatKeepClass that is used for storing and manipulating the input data.

addYnode(YnodeVals, det.Y)

Add protected Y node to private field and set to NA all determinisitc Y values for public field YnodeVals.

addObsWeights(obs.wts)

Add observation weights to public field.

evalsubst(subset_vars, subset_exprs = NULL)

...

get.dat.sVar(rowsubset = TRUE, covars)

Subset covariate design matrix for BinaryOutModel.

get.outvar(rowsubset = TRUE, var)

Subset a vector of outcome variable for BinaryOutModel.

get.obsweights(rowsubset = TRUE)

Subset a vector of observation weights for BinaryOutModel.

def.types.sVar(type.sVar = NULL)

Define each variable' class in input data: bin, cat or cont.

set.sVar.type(name.sVar, new.type)

Assign a new class type to one variable that belongs to the input data.

get.sVar.type(name.sVar)

Return the class type of a variable.

is.sVar.cont(name.sVar)

Check if the variable is continuous.

is.sVar.cat(name.sVar)

Check if the variable is categorical.

is.sVar.bin(name.sVar)

Check if the variable is binary.

get.sVar(name.sVar)

Return a vector of the variable values.

set.sVar(name.sVar, new.sVarVal)

Assign a vector of new values to the specific variable.

bin.nms.sVar(name.sVar, nbins)

Define names of bin indicators for sVar.

detect.sVar.intrvls(name.sVar, nbins, bin_bymass, bin_bydhist, max_nperbin)

...

detect.cat.sVar.levels(name.sVar)

Detect the unique categories in categorical sVar, returning in increasing order.

get.sVar.bw(name.sVar, intervals)

Get the bin widths vector for the discretized cont sVar.

get.sVar.bwdiff(name.sVar, intervals)

Get the bin widths differences vector for the discretized continuous sVar.

binirize.sVar(name.sVar, ...)

Create a matrix of bin indicators for categorical/cont sVar.

norm.cont.sVars()

Normalize continuous sVar (Note that this process is memory-costly).

fixmiss_sVar()

Replace all missing (NA) values with a default integer (Default to 0).

make.dat.sVar(p = 1, f.g_fun = NULL, regform = NULL)

Generate new exposures under user-specific arbitrary intervention f.g_fun and construct a data.frames that combines all covariates, replacing the old exposures with the new ones.

Active Bindings

names.sVar

Return variable names of the input data.

names.c.sVar

Return continuous variable names of the input data.

ncols.sVar

Return the number of columns of the input data.

nobs

Return the number of observations of the input data.

dat.sVar

Return a data frame object that stores the entire dataset (including all sVar.).

dat.bin.sVar

Return a stored matrix for bin indicators on currently binarized continous sVar.

active.bin.sVar

Return name(s) of active binarized continous sVar(s), changing when fit or predict is called.

emptydat.sVar

Wipe out dat.sVar.

emptydat.bin.sVar

Wipe out dat.bin.sVar.

noNA.Ynodevals

Return the observed Y without any missing values.

nodes

...

type.sVar

...

See Also

tmleCom_Options, tmleCommunity

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#***************************************************************************************
# Example 1: storing, managing, subsetting and manipulating a data with continuous A
data(indSample.iid.cA.cY_list)
indSample.iid.cA.cY <- indSample.iid.cA.cY_list$indSample.iid.cA.cY
psi0.Y <- indSample.iid.cA.cY_list$psi0.Y  # 0.333676
# Assume that W2 has no effect on neither A nor Y, so no need to put into nodes
nodes <- list(Ynode = "Y", Anodes = "A", WEnodes = c("W1", "W3", "W4"))  
tmleCom_Options(nbins = 10, maxNperBin = nrow(indSample.iid.cA.cY))
#***************************************************************************************

#***************************************************************************************
# 1.1 Specifying the stochastic intervention of interest gstar
#***************************************************************************************
# Interested in the effect of a shift of delta(W1, W3, W4) of the current treatment
define_f.gstar <- function(data, ...) {
  shift.mu <- 0.3 * data[,"W1"] + 0.6 * data[,"W3"] - 0.14 * data[,"W4"]
  shift.val <- rnorm(n = NROW(data), mean = shift.mu, sd = 0.5)
  shifted.new.A <- data[, "A"] - shift.val
  return(shifted.new.A)
}

#***************************************************************************************
# 1.2 Creating an R6 object of DatKeepClass (to store the input data)
#***************************************************************************************
# Don't normalize continous covariates by setting norm.c.sVars = FALSE
OData_R6 <- DatKeepClass$new(Odata = subset(indSample.iid.cA.cY, select=-Y), 
                             nodes = nodes[c("Anodes", "WEnodes")], norm.c.sVars = FALSE)  
OData_R6$nodes <- nodes
# names of all variables that are in input data and specified in nodes
OData_R6$names.sVar  # "A"  "W1" "W3" "W4" 
# names of all continuous variables that are in input data and specified in nodes
OData_R6$names.c.sVar  # "A" "W3" "W4" 
# a sub dataframe of the input data, including all variables in nodes
head(OData_R6$dat.sVar) 
# the number of observations of the input data
OData_R6$nobs  # 10000
OData_R6$get.sVar.type("A")  # "contin"
OData_R6$get.sVar.type()  # Provide a list of types of all variables 

#***************************************************************************************
# 1.3 Manipulating the input data by adding observed outcomes and observation weights
#***************************************************************************************
# Bound observed outcome into [0, 1]
obsYvals <- indSample.iid.cA.cY[, nodes$Ynode]
ab <- range(obsYvals, na.rm=TRUE)
indSample.iid.cA.cY[, nodes$Ynode] <- (obsYvals-ab[1]) / diff(ab)

# Add YnodeVals (a vector of outcomes) to both public and private field 
OData_R6$addYnode(YnodeVals = indSample.iid.cA.cY[, nodes$Ynode], det.Y = FALSE)  
# set YnodeVals[det.Y=TRUE] to NA in public field (with NAs)
head(OData_R6$YnodeVals)  
# protect YnodeVals from being set to NA in private field (without NAs)  
head(OData_R6$noNA.Ynodevals)  

# Add a vector of observation (sampling) weights
OData_R6$addObsWeights(obs.wts = rep(c(1,2), 5000))  
# Assume all weights to be 1 (i.e., equally weighted)
OData_R6$addObsWeights(obs.wts = 1)  

#***************************************************************************************
# 1.4 Creating an new R6 object of DatKeepClass under stochastic intervention g.star
# Generate new exposures under user-specific intervention f.g_fun
#***************************************************************************************
OData.gstar_R6 <- DatKeepClass$new(Odata = indSample.iid.cA.cY, nodes = nodes)
# Create 1 new Odata and replace A under g0 in Odata with A* under g.star
set.seed(12345)
OData.gstar_R6$make.dat.sVar(p = 1, f.g_fun = define_f.gstar) 
dim(OData.gstar_R6$dat.sVar)  # 10000     4
# Create 3 new Odatas and repalce A with A*
OData.gstar_R6$make.dat.sVar(p = 3, f.g_fun = define_f.gstar) 
dim(OData.gstar_R6$dat.sVar)  # 30000     4
# Since A* is stochastically generated, each p may produce different values of A*
head(OData.gstar_R6$dat.sVar[1:10000, ])
head(OData.gstar_R6$dat.sVar[10001:20000, ])

chizhangucb/tmleCommunity documentation built on April 3, 2018, 1:10 p.m.