optimSplit_dichotom: Optimal Dichotomizing Predictors via Repeated Sample Splits

View source: R/optimSplit_dichotom.R

optimSplit_dichotomR Documentation

Optimal Dichotomizing Predictors via Repeated Sample Splits

Description

To identify the optimal dichotomizing predictors using repeated sample splits.

Usage

optimSplit_dichotom(
  formula,
  data,
  include = quote(p1 > 0.15 & p1 < 0.85),
  top = 1L,
  nsplit,
  ...
)

split_dichotom(y, x, id, ...)

splits_dichotom(y, x, ids = rSplit(y, ...), ...)

## S3 method for class 'splits_dichotom'
quantile(x, probs = 0.5, ...)

Arguments

formula, y, x

formula, e.g., y~X or y~x1+x2. Response y may be double, logical and Surv. Candidate numeric predictors x's may be specified as the columns of one matrix column, e.g., y~X; or as several vector columns, e.g., y~x1+x2. In helper functions, x is a numeric vector.

data

data.frame

include

(optional) language, inclusion criteria. Default (p1>.15 & p1<.85) specifies a user-desired range of p_1 for the candidate dichotomizing predictors. See explanation of p_1 in section Returns of Helper Functions.

top

positive integer scalar, number of optimal dichotomizing predictors, default 1L

nsplit, ...

additional parameters for function rSplit

id

logical vector for helper function split_dichotom, indices of training (TRUE) and test (FALSE) subjects

ids

(optional) list of logical vectors for helper function splits_dichotom, multiple copies of indices of repeated training-test sample splits.

probs

double scalar for helper function quantile.splits_dichotom, see quantile

Details

Function optimSplit_dichotom identifies the optimal dichotomizing predictors via repeated sample splits. Specifically,

  1. Generate multiple, i.e., repeated, training-test sample splits (via rSplit)

  2. For each candidate predictor x_i, find the median-split-dichotomized regression model based on the repeated sample splits, see details in section Details on Helper Functions

  3. Limit the selection of the candidate predictors x's to a user-desired range of p_1 of the split-dichotomized regression models, see explanations of p_1 in section Returns of Helper Functions

  4. Rank the candidate predictors x's by the decreasing order of the absolute values of the regression coefficient estimate of the median-split-dichotomized regression models. On the top of this rank are the optimal dichotomizing predictors.

Value

Function optimSplit_dichotom returns an object of class 'optimSplit_dichotom', which is a list of dichotomizing functions, with the input formula and data as additional attributes.

Details on Helper Functions

Split-Dichotomized Regression Model

Helper function split_dichotom performs a univariable regression model on the test set with a dichotomized predictor, using a dichotomizing rule determined by a recursive partitioning of the training set. Specifically, given a training-test sample split,

  1. find the dichotomizing rule \mathcal{D} of the predictor x_0 given the response y_0 in the training set (via rpartD);

  2. fit a univariable regression model of the response y_1 with the dichotomized predictor \mathcal{D}(x_1) in the test set.

Currently the Cox proportional hazards (coxph) regression for Surv response, logistic (glm) regression for logical response and linear (lm) regression for gaussian response are supported.

Split-Dichotomized Regression Models based on Repeated Training-Test Sample Splits

Helper function splits_dichotom fits multiple split-dichotomized regression models split_dichotom on the response y and predictor x, based on each copy of the repeated training-test sample splits.

Quantile of Split-Dichotomized Regression Models

Helper function quantile.splits_dichotom is a method dispatch of the S3 generic function quantile on splits_dichotom object. Specifically,

  1. collect the univariable regression coefficient estimate from each one of the split-dichotomized regression models;

  2. find the nearest-even (i.e., type = 3) quantile of the coefficients from Step 1. By default, we use the median (i.e., prob = .5);

  3. the split-dichotomized regression model corresponding to the selected coefficient quantile in Step 2, is returned.

Returns of Helper Functions

Helper function split_dichotom returns a split-dichotomized regression model, which is either a Cox proportional hazards (coxph), a logistic (glm), or a linear (lm) regression model, with additional attributes

attr(,'rule')

function, dichotomizing rule \mathcal{D} based on the training set

attr(,'text')

character scalar, human-friendly description of \mathcal{D}

attr(,'p1')

double scalar, p_1 = \text{Pr}(\mathcal{D}(x_1)=1)

attr(,'coef')

double scalar, univariable regression coefficient estimate of y_1\sim\mathcal{D}(x_1)

Helper function splits_dichotom returns a list of split-dichotomized regression models (split_dichotom).

Helper function quantile.splits_dichotom returns a split-dichotomized regression model (split_dichotom).

Examples

# see ?`Qindex-package`

Qindex documentation built on April 4, 2025, 2:14 a.m.