Description Usage Format Details Methods Active Bindings
This R6 class defines fields and methods that controls all the parameters for non-parametric
modeling and estimation of multivariate joint conditional probability model P(sA|sW)
for summary measures (sA,sW)
.
Note that sA
can be multivariate and any component of sA[j]
can be either binary, categorical or continuous.
The joint probability for P(sA|sA)
= P(sA[1],...,sA[k]|sA)
is first factorized as
P(sA[1]|sA)
* P(sA[2]|sA, sA[1])
* ... * P(sA[k]|sA, sA[1],...,sA[k-1])
,
where each of these conditional probability models is defined by a new instance of a GenericModel
class
(and a corresponding instance of the RegressionClass
class).
If sA[j]
is binary, the conditional probability P(sA[j]|sW,sA[1],...,sA[j-1])
is evaluated via logistic regression model.
When sA[j]
is continuous (or categorical), its estimation will be controlled by a new instance of
the ContinModel
class (or the CategorModel
class), as well as the accompanying new instance of the
RegressionClass
class. The range of continuous sA[j]
will be fist partitioned into K
bins and the corresponding K
bin indicators (B_1,...,B_K
), with K
new instances of GenericModel
class, each instance defining a
single logistic regression model for one binary bin indicator outcome B_j
and predictors (sW, sA[1],...,sA[k-1]
).
Thus, the first instance of RegressionClass
and GenericModel
classes will automatically
spawn recursive calls to new instances of these classes until the entire tree of binary logistic regressions that defines
the joint probability P(sA|sW)
is build.
1 |
An R6Class
generator object
sep_predvars_sets
- Logical indicating the type of regression to run,
if TRUE
fit the joint P(outvar
|predvars
) (default),
More specifically, if FALSE
(default), use the same predictors in predvars
(vector of names) for all nodes in outvar
;
when TRUE
uses separate sets in predvars
(must be a named list of character vectors) for fitting each node in outvar
.
outvar.class
- Character vector indicating a class of each outcome var: bin
/ cont
/ cat
.
outvar
- Character vector of regression outcome variable names.
predvars
- Either a pooled character vector of all predictors (sW
) or a vector of regression-specific predictor names.
When sep_predvars_sets=TRUE
, this must be a named list of predictor names, the list names corresponding to each node name in outvar
,
and each list item being a vector specifying the regression predictors for a specific outcome in outvar
.
reg_hazard - Logical, if TRUE, the joint probability model P(outvar | predvars) is factorized as \prod_jP(outvar[j] | predvars) for each j outvar (for fitting hazard).
subset_vars
- Subset variables (later evaluated to logical vector based on non-missing (!is.na()) values of these variables).
subset_exprs
- Subset expressions (later evaluated to logical vector in the envir of the data).
ReplMisVal0
- Logical, if TRUE all gvars$misval among predicators are replaced with with gvars$misXreplace (0).
nbins
- Integer number of bins used for a continuous outvar, the intervals are defined inside
ContinModel$new()
and then saved in this field.
bin_nms
- Character vector of column names for bin indicators.
useglm
- Logical, if TRUE then fit the logistic regression model using glm.fit
,
if FALSE use speedglm.wfit
..
regressions (requires registering back-end cluster prior to calling the fit/predict functions)..
bin_bymass
- Logical, for continuous outvar, create bin cutoffs based on equal mass distribution.
bin_bydhist
- Logical, if TRUE, use dhist approach for bin definitions. See Denby and Mallows "Variations on the
Histogram" (2009)) for more..
max_nperbin
- Integer, maximum number of observations allowed per one bin.
pool_cont
- Logical, pool binned continuous outvar observations across bins and only fit only regression model
across all bins (adding bin_ID as an extra covaraite)..
outvars_to_pool
- Character vector of names of the binned continuous outvars, should match bin_nms
.
intrvls.width
- Named numeric vector of bin-widths (bw_j : j=1,...,M
) for each each bin in self$intrvls
.
When sA
is not continuous, intrvls.width
IS SET TO 1. When sA is continuous and this variable intrvls.width
is not here, the intervals are determined inside ContinModel$new()
and are assigned to this variable as a list,
with names(intrvls.width) <- reg$bin_nms
. Can be queried by BinaryOutcomeModel$predictAeqa()
as: intrvls.width[outvar]
.
intrvls
- Numeric vector of cutoffs defining the bins or a named list of numeric intervals for length(self$outvar) > 1
.
cat.levels
- Numeric vector of all unique values in categorical outcome variable.
Set by CategorModel
constructor.
new(sep_predvars_sets = FALSE,
outvar.class = gvars$sVartypes$bin,
outvar, predvars, subset_vars, subset_exprs, intrvls,
ReplMisVal0 = TRUE,
useglm = getopt("useglm"),
nbins = getopt("nbins"),
bin_bymass = getopt("bin.method")
bin_bydhist = getopt("bin.method")
max_nperbin = getopt("maxNperBin"),
pool_cont = getopt("poolContinVar")
Uses the arguments to instantiate an object of R6 class and define the future regression model.
ChangeManyToOneRegresssion(k_i, reg)
Take a clone of a parent RegressionClass
(reg
) for length(self$outvar)
regressions
and set self to a single univariate k_i
regression for outcome self$outvar[[k_i]]
.
ChangeOneToManyRegresssions(regs_list)
Take the clone of a parent RegressionClass
for univariate (continuous outvar) regression
and set self to length(regs_list)
bin indicator outcome regressions.
resetS3class()
...
S3class
...
get.reg
...
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.