BBC_dichotom | R Documentation |
Multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors.
BBC_dichotom(formula, data, ...)
optimism_dichotom(fom, X, data, R = 100L, ...)
coef_dichotom(fom, X., data)
formula |
formula, e.g., |
data |
data.frame |
... |
additional parameters, currently not in use |
fom |
formula, e.g., |
X |
numeric matrix of |
R |
positive integer scalar,
number of bootstrap replicates |
X. |
logical matrix |
Function BBC_dichotom obtains a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors. Specifically,
Obtain the dichotomizing rules \mathbf{\mathcal{D}}
of predictors x_1,\cdots,x_k
based on response y
(via m_rpartD).
Multivariable regression (with additional predictors z
, if any)
with dichotomized predictors \left(\tilde{x}_1,\cdots,\tilde{x}_k\right) = \mathcal{D}\left(x_1,\cdots,x_k\right)
(via helper function coef_dichotom)
is the apparent performance.
Obtain the bootstrap-based optimism based on R
copies of bootstrap samples (via helper function optimism_dichotom).
The median of bootstrap-based optimism over R
bootstrap copies
is the optimism-correction of the dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k
.
Subtract the optimism-correction (in Step 2) from the apparent performance estimates (in Step 1),
only for \tilde{x}_1,\cdots,\tilde{x}_k
.
The apparent performance estimates for additional predictors z
's, if any, are not modified.
Neither the variance-covariance (vcov) estimates
nor the other regression diagnostics, e.g.,
residuals,
logLikelihood,
etc.,
of the apparent performance are modified for now.
This coefficient-only, partially-modified regression model is
the optimism-corrected performance.
Function BBC_dichotom returns a coxph, glm or lm regression model, with attributes,
attr(,'optimism')
the returned object from optimism_dichotom
attr(,'apparent_cutoff')
a double vector,
cutoff thresholds for the k
predictors in the apparent model
Helper function optimism_dichotom computes the bootstrap-based optimism of the dichotomized predictors. Specifically,
R
copies of bootstrap samples are generated. In the j
-th bootstrap sample,
obtain the dichotomizing rules \mathbf{\mathcal{D}}^{(j)}
of predictors x_1^{(j)},\cdots,x_k^{(j)}
based on response y^{(j)}
(via m_rpartD)
multivariable regression (with additional predictors z^{(j)}
, if any) coefficient estimates \mathbf{\hat{\beta}}^{(j)} = \left(\hat{\beta}_1^{(j)},\cdots,\hat{\beta}_k^{(j)}\right)^t
of
the dichotomized predictors \left(\tilde{x}_1^{(j)},\cdots,\tilde{x}_k^{(j)}\right) = \mathcal{D}^{(j)}\left(x_1^{(j)},\cdots,x_k^{(j)}\right)
(via coef_dichotom)
are the bootstrap performance estimate.
Dichotomize x_1,\cdots,x_k
in the entire data using each of the bootstrap rules \mathcal{D}^{(1)},\cdots,\mathcal{D}^{(R)}
.
Multivariable regression (with additional predictors z
, if any) coefficient estimates \mathbf{\hat{\beta}}^{[j]} = \left(\hat{\beta}_1^{[j]},\cdots,\hat{\beta}_k^{[j]}\right)^t
of
the dichotomized predictors \left(\tilde{x}_1^{[j]},\cdots,\tilde{x}_k^{[j]}\right) = \mathcal{D}^{(j)}\left(x_1,\cdots,x_k\right)
(via coef_dichotom)
are the test performance estimate.
Difference between the bootstrap and test performance estimates,
an R\times k
matrix of \left(\mathbf{\hat{\beta}}^{(1)},\cdots,\mathbf{\hat{\beta}}^{(R)}\right)
minus
another R\times k
matrix of \left(\mathbf{\hat{\beta}}^{[1]},\cdots,\mathbf{\hat{\beta}}^{[R]}\right)
,
are the bootstrap-based optimism.
\tilde{x}
'sHelper function coef_dichotom
fits a multivariable Cox proportional hazards (coxph) model for Surv response,
logistic (glm) regression model for logical response,
or linear (lm) regression model for gaussian response,
with
the dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k
as well as
the additional predictors z
's.
It is almost inevitable to have duplicates among the dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k
.
In such case, the multivariable model is fitted using the unique \tilde{x}
's.
Helper function optimism_dichotom returns an R\times k
double matrix of
bootstrap-based optimism,
with attributes
attr(,'cutoff')
an R\times k
double matrix,
the R
copies of bootstrap cutoff thresholds for the k
predictors.
See attribute 'cutoff'
of function m_rpartD
Helper function coef_dichotom returns a double vector of
the regression coefficients of dichotomized predictors \tilde{x}
's, with attributes
attr(,'model')
the coxph, glm or lm regression model
In the case of duplicated \tilde{x}
's, the regression coefficients of the unique \tilde{x}
's are duplicated for those duplicates in \tilde{x}
's.
Ewout W. Steyerberg (2009) Clinical Prediction Models. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-0-387-77244-8")}
Frank E. Harrell Jr., Kerry L. Lee, Daniel B. Mark. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4")}
library(survival)
data(flchain, package = 'survival') # see more details from ?survival::flchain
head(flchain2 <- within.data.frame(flchain, expr = {
mgus = as.logical(mgus)
}))
dim(flchain3 <- subset(flchain2, futime > 0)) # required by ?rpart::rpart
dim(flchain_Circulatory <- subset(flchain3, chapter == 'Circulatory'))
m1 = BBC_dichotom(Surv(futime, death) ~ age + sex + mgus ~ kappa + lambda,
data = flchain_Circulatory, R = 1e2L)
summary(m1)
matrixStats::colMedians(BBC_cutoff(m1)) # median bootstrap cutoff
attr(m1, 'apparent_cutoff')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.