BinaryOutModel: R6 class for modeling (fitting and predicting) for a single...

Description Usage Format Details Methods Active Bindings See Also Examples

Description

BinaryOutModel can store and manage the (binarize/ discretized) design matrix Xmat and the outcome Bin for the binary regression P(Bin|Xmat). It provides argument self$estimator to include different candidate estimators in the fitting and predicting library, such as data-adaptive super learner algorithms and parametric logistic regression. When fitting one pooled regression across multiple bins, it provides method to convert data from wide to long format when requested (to gain computational efficiency).

Usage

1

Format

An R6Class generator object

Details

Methods

new(reg)

Use reg (a RegressionClass class object) to instantiate an new object of BinaryOutModel for a single binary regression.

newdata(newdata, getoutvar = TRUE, ...)

Evaluate subset and perform correct subseting of data to construct X_mat, Yvals & wt_vals.

define.subset.idx(data)

Create a logical vector which is converted from subset_expr

fit(overwrite = FALSE, data, predict = FALSE, savespace = TRUE, ...)

fit a binary regression. Note that overwrite is Logical. If FALSE (Default), the previous fitted model cannot be overwritten by new fitting model. savespace is Logical. If TRUE (Default), wipe out all internal data when doing many stacked regressions.

copy.fit(bin.out.model)

Take fitted BinaryOutModel object as an input and save the fit to itself.

predict(newdata, savespace = TRUE, ...)

Predict the response P(A = 1|W = w, E = e).

copy.predict(bin.out.model)

Tke BinaryOutModel object that contains the predictions for P(A=1|w,e) and save to itself

predictAeqa(newdata, bw.j.sA_diff, savespace = TRUE, wipeProb = TRUE)

Predict the response P(A = a|W = w, E = e) for observed A, W, E. Note that wipeProb is logical argument for self$wipe.alldat. If FALSE, vectors of probA1 & probAeqa will be kept.

show()

Print regression formula, including outcome and predictor names.

Active Bindings

wipe.alldat(wipeProb = TRUE)

...

getfit

...

getprobA1

...

getprobAeqa

...

emptydata

...

emptyY

...

emptyWeight

...

emptySubset_idx

...

getXmat

...

getY

...

getWeight

...

See Also

DatKeepClass, RegressionClass, tmleCom_Options

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#***************************************************************************************
# Example 1: Estimate a outcome regression directly through BinaryOutModel
data(indSample.iid.bA.bY.rareJ2_list)
indSample.iid.bA.bY.rareJ2 <- indSample.iid.bA.bY.rareJ2_list$indSample.iid.bA.bY.rareJ2
N <- nrow(indSample.iid.bA.bY.rareJ2)
# speed.glm to fit regressions (it's GLMs to medium-large datasets)
tmleCom_Options(Qestimator = "speedglm__glm", maxNperBin = N)
options(tmleCommunity.verbose = TRUE)  # Print status messages 
#***************************************************************************************

#***************************************************************************************
# 1.1 Specifying outcome and predictor variables for outcome mechanism
#***************************************************************************************
# Y depends on all its parent nodes (A, W1, W2, W3, W4) 
Qform.all <- Y ~ W1 + W2 + W3 + W4 + A
Q.sVars1 <- tmleCommunity:::define_regform(regform = Qform.all)

# Equivalent way to define Q.sVars: use Anodes.lst (outcomes) & Wnodes.lst (predictors)
# node can only contain one or more of Ynode, Anodes, WEnodes, communityID and Crossnodes
nodes <- list(Ynode = "Y", Anodes = "A", WEnodes = c("W1", "W2", "W3", "W4"))
Q.sVars2 <- tmleCommunity:::define_regform(regform = NULL, Anodes.lst = nodes$Ynode, 
                                           Wnodes.lst = nodes[c("Anodes", "WEnodes")])

# Also allows to include interaction terms in regression formula  (Correct Qform)
Qform.interact <- Y ~ W1 + W2*A + W3 + W4
Q.sVars3 <- tmleCommunity:::define_regform(regform = Qform.interact)

# Alternative way to define Qform.interact 
Qform.interact2 <- Y ~ W1 + W2 + W3 + W4 + A + W2:A
Q.sVars4 <- tmleCommunity:::define_regform(regform = Qform.interact2)

#***************************************************************************************
# 1.2 Fit and predict a regression model for outcome mechanism Qbar(A, W)
#***************************************************************************************
# Create a new object of DatKeepClass that can store and munipulate the input data
OData_R6 <- DatKeepClass$new(Odata = indSample.iid.bA.bY.rareJ2, 
                             nodes = nodes, norm.c.sVars = FALSE)
# Add a vector of observation (sampling) weights that encodes knowledge of rare outcome
OData_R6$addObsWeights(obs.wts = indSample.iid.bA.bY.rareJ2_list$obs.wt.J2)

# Create a new object of RegressionClass that defines regression models
# using misspecified Qform (without interaction term) 
Qreg <- RegressionClass$new(outvar = Q.sVars1$outvars, predvars = Q.sVars1$predvars, 
                            subset_vars = (!rep_len(FALSE, N)))

# Set savespace=FALSE to save all productions during fitting, including models and data
m.Q.init <- BinaryOutModel$new(reg = Qreg)$fit(data = OData_R6, savespace = FALSE)
length(m.Q.init$getY)  # 3000, the outcomes haven't been erased since savespace = FALSE
head(m.Q.init$getXmat)  # the predictor matrix is kept since savespace = FALSE
m.Q.init$getfit$coef  # Provide cofficients from the fitting regression
m.Q.init$is.fitted  # TRUE

# Now fit the same regression model but set savespace to TRUE (only fitted model left)
# Need to set overwrite to TRUE to avoid error when m.Q.init is already fitted
m.Q.init <- m.Q.init$fit(overwrite = TRUE, data = OData_R6, savespace = TRUE)
all(is.null(m.Q.init$getXmat), is.null(m.Q.init$getY))  # TRUE, all wiped out

# Set savespace = TRUE to wipe out any traces of saved data in predict step
m.Q.init$predict(newdata = OData_R6, savespace = TRUE)
is.null(m.Q.init$getXmat)  # TRUE, the covariates matrix has been erased to save RAM space
mean(m.Q.init$getprobA1)  # 0.02175083, bad estimate since misspecified Qform

#***************************************************************************************
# 1.3 Same as above but using Super Learner (data-adaptive algorithms)
#***************************************************************************************
# Specifying the SuperLearner library in tmleCom_Options() 
library(SuperLearner)
tmleCom_Options(SL.library = c("SL.glm", "SL.randomForest"), maxNperBin = N)
# Instead of reinitiating a RegressionClass object, change estimator directly in Qreg 
# so don't need to redefine Qestimator in tmleCom_Options()
Qreg$estimator <- "SuperLearner"

set.seed(12345)
m.Q.init <- BinaryOutModel$new(reg = Qreg)$fit(data = OData_R6, savespace = TRUE)
m.Q.init$predict(newdata = OData_R6, savespace = TRUE)
mean(m.Q.init$getprobA1)

chizhangucb/tmleCommunity documentation built on April 3, 2018, 1:10 p.m.