BBC_dichotom: Bootstrap-based Optimism Correction for Dichotomization

View source: R/BBC_dichotom.R

BBC_dichotomR Documentation

Bootstrap-based Optimism Correction for Dichotomization

Description

Multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors.

Usage

BBC_dichotom(formula, data, ...)

optimism_dichotom(fom, X, data, R = 100L, ...)

coef_dichotom(fom, X., data)

Arguments

formula

formula, e.g., y~z~x or y~1~x. Response y may be double, logical and Surv. Predictors x's to be dichotomized may be one or more numeric vectors and/or one matrix. Additional predictors z's, if any, may be of any type.

data

data.frame

...

additional parameters, currently not in use

fom

formula, e.g., y~z or y~1, for helper functions, with the response y and additional predictors z's, if any

X

numeric matrix of k columns, numeric predictors x_1,\cdots,x_k to be dichotomized

R

positive integer scalar, number of bootstrap replicates R, default 100L

X.

logical matrix \tilde{X} of k columns, dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k

Details

Function BBC_dichotom obtains a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors. Specifically,

  1. Obtain the dichotomizing rules \mathbf{\mathcal{D}} of predictors x_1,\cdots,x_k based on response y (via m_rpartD). Multivariable regression (with additional predictors z, if any) with dichotomized predictors \left(\tilde{x}_1,\cdots,\tilde{x}_k\right) = \mathcal{D}\left(x_1,\cdots,x_k\right) (via helper function coef_dichotom) is the apparent performance.

  2. Obtain the bootstrap-based optimism based on R copies of bootstrap samples (via helper function optimism_dichotom). The median of bootstrap-based optimism over R bootstrap copies is the optimism-correction of the dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k.

  3. Subtract the optimism-correction (in Step 2) from the apparent performance estimates (in Step 1), only for \tilde{x}_1,\cdots,\tilde{x}_k. The apparent performance estimates for additional predictors z's, if any, are not modified. Neither the variance-covariance (vcov) estimates nor the other regression diagnostics, e.g., residuals, logLikelihood, etc., of the apparent performance are modified for now. This coefficient-only, partially-modified regression model is the optimism-corrected performance.

Value

Function BBC_dichotom returns a coxph, glm or lm regression model, with attributes,

attr(,'optimism')

the returned object from optimism_dichotom

attr(,'apparent_cutoff')

a double vector, cutoff thresholds for the k predictors in the apparent model

Details on Helper Functions

Bootstrap-Based Optimism

Helper function optimism_dichotom computes the bootstrap-based optimism of the dichotomized predictors. Specifically,

  1. R copies of bootstrap samples are generated. In the j-th bootstrap sample,

    1. obtain the dichotomizing rules \mathbf{\mathcal{D}}^{(j)} of predictors x_1^{(j)},\cdots,x_k^{(j)} based on response y^{(j)} (via m_rpartD)

    2. multivariable regression (with additional predictors z^{(j)}, if any) coefficient estimates \mathbf{\hat{\beta}}^{(j)} = \left(\hat{\beta}_1^{(j)},\cdots,\hat{\beta}_k^{(j)}\right)^t of the dichotomized predictors \left(\tilde{x}_1^{(j)},\cdots,\tilde{x}_k^{(j)}\right) = \mathcal{D}^{(j)}\left(x_1^{(j)},\cdots,x_k^{(j)}\right) (via coef_dichotom) are the bootstrap performance estimate.

  2. Dichotomize x_1,\cdots,x_k in the entire data using each of the bootstrap rules \mathcal{D}^{(1)},\cdots,\mathcal{D}^{(R)}. Multivariable regression (with additional predictors z, if any) coefficient estimates \mathbf{\hat{\beta}}^{[j]} = \left(\hat{\beta}_1^{[j]},\cdots,\hat{\beta}_k^{[j]}\right)^t of the dichotomized predictors \left(\tilde{x}_1^{[j]},\cdots,\tilde{x}_k^{[j]}\right) = \mathcal{D}^{(j)}\left(x_1,\cdots,x_k\right) (via coef_dichotom) are the test performance estimate.

  3. Difference between the bootstrap and test performance estimates, an R\times k matrix of \left(\mathbf{\hat{\beta}}^{(1)},\cdots,\mathbf{\hat{\beta}}^{(R)}\right) minus another R\times k matrix of \left(\mathbf{\hat{\beta}}^{[1]},\cdots,\mathbf{\hat{\beta}}^{[R]}\right), are the bootstrap-based optimism.

Multivariable Regression Coefficient Estimates of Dichotomized Predictors \tilde{x}'s

Helper function coef_dichotom fits a multivariable Cox proportional hazards (coxph) model for Surv response, logistic (glm) regression model for logical response, or linear (lm) regression model for gaussian response, with the dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k as well as the additional predictors z's.

It is almost inevitable to have duplicates among the dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k. In such case, the multivariable model is fitted using the unique \tilde{x}'s.

Returns of Helper Functions

Of helper function optimism_dichotom

Helper function optimism_dichotom returns an R\times k double matrix of bootstrap-based optimism, with attributes

attr(,'cutoff')

an R\times k double matrix, the R copies of bootstrap cutoff thresholds for the k predictors. See attribute 'cutoff' of function m_rpartD

Of helper function coef_dichotom

Helper function coef_dichotom returns a double vector of the regression coefficients of dichotomized predictors \tilde{x}'s, with attributes

attr(,'model')

the coxph, glm or lm regression model

In the case of duplicated \tilde{x}'s, the regression coefficients of the unique \tilde{x}'s are duplicated for those duplicates in \tilde{x}'s.

References

For helper function optimism_dichotom

Ewout W. Steyerberg (2009) Clinical Prediction Models. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-0-387-77244-8")}

Frank E. Harrell Jr., Kerry L. Lee, Daniel B. Mark. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4")}

Examples

library(survival)
data(flchain, package = 'survival') # see more details from ?survival::flchain
head(flchain2 <- within.data.frame(flchain, expr = {
  mgus = as.logical(mgus)
}))
dim(flchain3 <- subset(flchain2, futime > 0)) # required by ?rpart::rpart
dim(flchain_Circulatory <- subset(flchain3, chapter == 'Circulatory'))

m1 = BBC_dichotom(Surv(futime, death) ~ age + sex + mgus ~ kappa + lambda, 
 data = flchain_Circulatory, R = 1e2L)
summary(m1)
matrixStats::colMedians(BBC_cutoff(m1)) # median bootstrap cutoff
attr(m1, 'apparent_cutoff')


Qindex documentation built on April 4, 2025, 2:14 a.m.