library(knitr) opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE)
This vignette of package maxEff
(Github,
RPubs) documents three methods of selecting one from many numeric
predictors for a regression model, to ensure that the additional predictor has the maximum effect size. The three methods are implemented in functions add_num()
, add_dummy()
and add_dummy_partition()
.
Examples in this vignette require that the search
path has
library(maxEff) library(groupedHyperframe) library(survival) library(rpart)
Users should remove parameter mc.cores = 1L
from all examples and use the default option, which engages all CPU cores on the current host for macOS. The authors are forced to have mc.cores = 1L
in this vignette in order to pass CRAN
's submission check.
c( '', 'Forward pipe operator', '`?base::pipeOp` introduced in `R` 4.1.0', '`abs`', 'Absolute value', '`base::abs`', '`coxph`', 'Cox model', '`survival::coxph`', '`CRAN`, `R`', 'The Comprehensive R Archive Network', 'https://cran.r-project.org', '`factor`', 'Factor, or categorical variable', '`base::factor`', '`function`', '`R` function', '``base::`function` ``', '`groupedHyperframe`', 'Grouped hyper data frame', ' `groupedHyperframe::as.groupedHyperframe`', '`head`', 'First parts of an object', '`utils::head`; `utils:::head.default`', '`hypercolumns`, `hyperframe`', '(Hyper columns of) hyper data frame', '`spatstat.geom::hyperframe`', '`labels`', 'Labels from object', '`base::labels`; `maxEff::labels.node1`', '`levels`', 'Levels of a `factor`', '`base::levels`', '`listof`', 'List of objects', '`stats::listof`', '`logistic`', 'Logistic regression model', '`stats::glm(., family = binomial(\'logit\'))`', '`matrix`', 'Matrix', '`base::matrix`', '`partition`', 'Stratified partition', '`maxEff::statusPartition`, `caret::createDataPartition`', '`PFS`', 'Progression/recurrence free survival', 'https://en.wikipedia.org/wiki/Progression-free_survival', '`predict`', 'Model prediction', '`stats::predict`; `maxEff::predict.add_num`; `maxEff::predict.add_dummy`', '`quantile`', 'Quantile', '`stats::quantile`', '`rpart`', 'Recursive partitioning and regression trees', '`rpart::rpart`', '`S3`, `generic`, `methods`', '`S3` object oriented system', '`base::UseMethod`; `utils::methods`; `utils::getS3method`; https://adv-r.hadley.nz/s3.html', '`sort_by`', 'Sort an object by some criterion', '`base::sort_by`; `maxEff::sort_by.add_`', '`subset`', 'Subsets of object by conditions', '`base::subset`; `maxEff::subset.add_dummy`', '`Surv`', 'Survival object', '`survival::Surv`', '`update`', 'Update and re-fit a model call', '`stats::update`' ) |> matrix(nrow = 3L, dimnames = list(c('Term / Abbreviation', 'Description', 'Reference'), NULL)) |> t.default() |> as.data.frame.matrix() |> kable()
This work is supported by NCI R01CA222847 (I. Chervoneva, T. Zhan, and H. Rui) and R01CA253977 (H. Rui and I. Chervoneva).
node1()
: Dichotomization via First Node of Recursive PartitioningConsider the following recursive partitioning and regression example,
data(cu.summary, package = 'rpart') (r = rpart(Price ~ Mileage, data = cu.summary, cp = .Machine$double.eps, maxdepth = 1L))
Function node1()
creates a dichotomizing rule, i.e., a function
, based on the first node of partitioning,
(foo = r |> node1())
The dichotomizing rule foo
converts a numeric
object to a logical
object.
set.seed(125); rnorm(6, mean = 24.5) |> foo()
Developers may obtain the numeric
cutoff value of foo
, or a brief text of to describe foo
, for further analysis.
foo |> get_cutoff() foo |> labels()
statusPartition()
: Stratified Partition by Survival StatusFunction statusPartition()
performs stratified partition based on the survival status of a Surv
response, to avoid the situation that Cox model is degenerate if all subjects are censored. This is a deviation from the very popular function caret::createDataPartition()
, which stratifies Surv
response by the quantile
s of the survival time.
y = (survival::capacitor) |> with(expr = Surv(time, status)) set.seed(15); id = y |> statusPartition(times = 1L, p = .5) table(y[id[[1L]], 2L]) / table(y[,2L]) # balanced by status set.seed(15); id0 = y |> caret::createDataPartition(times = 1L, p = .5) table(y[id0[[1L]], 2L]) / table(y[,2L]) # *not* balanced by status
Data set Ki67
in package groupedHyperframe
is a groupedHyperframe
.
data(Ki67, package = 'groupedHyperframe') Ki67
s = Ki67 |> aggregate_quantile(by = ~ patientID, probs = seq.int(from = .01, to = .99, by = .01))
Candidate of additional predictors are stored in a numeric
-hypercolumn
logKi67.quantile
.
Users are encouraged to learn more about groupedHyperframe
class and function aggregate_quantile()
from package groupedHyperframe
vignette.
Partition into a training (80%) and test (20%) set.
set.seed(234); id = sample.int(n = nrow(s), size = nrow(s)*.8) |> sort.int() s0 = s[id, , drop = FALSE] # training set s1 = s[-id, , drop = FALSE] # test set
Let's consider a starting model of endpoint PFS
with predictor Tstage
on the training set s0
.
summary(m <- coxph(PFS ~ Tstage, data = s0))
numeric
Predictor with Maximum Effect SizeFunction add_num()
treats each additional predictor as a numeric
variable, and update
s the starting model with each additional predictor. Function add_num()
returns an 'add_num'
object, which is a listof
objects with an internal class 'add_num_'
.
The S3
generic sort_by()
sorts the 'add_num'
object by the abs
olute value of the regression coefficient (i.e., effect size) of the additioanal numeric
predictor.
The S3
generic head()
chooses the top n
element from the object returned from the previous step.
set.seed(14837); m1 = m |> add_num(x = ~ logKi67.quantile, mc.cores = 1L) |> sort_by(y = abs(effsize)) |> head(n = 2L) m1
The S3 generic predict()
uses model m1
on the test set s1
.
m1[1L] |> predict(newdata = s1)
logical
Predictor with Maximum Effect Sizeadd_dummy()
: Naive PracticeFunction add_dummy()
partitions each additional numeric
predictor into a logical
variable using function node1()
, and update
s the starting model by adding in each of the dichotomized logical
predictor. Function add_dummy()
returns an 'add_dummy'
object, which is a listof
node1
objects.
The S3
generic subset()
subsets the the 'add_dummy'
object by the balance of partition of the additional predictor.
The S3
generic sort_by()
sorts the 'add_dummy'
object by the abs
olute value of regression coefficient (i.e., effect size) of the additional logical
predictor.
The S3
generic head()
chooses the top n
element from the object returned from the previous step.
set.seed(14837); m2 = m |> add_dummy(x = ~ logKi67.quantile, mc.cores = 1L) |> subset(subset = p1 > .05 & p1 < .95) |> sort_by(y = abs(effsize)) |> head(n = 2L) m2
The S3 generic predict()
uses model m2
on the test set s1
.
m2[1L] |> predict(newdata = s1)
add_dummy_partition()
: via Repeated PartitionsFunction add_dummy_partition()
partitions each additional numeric
predictor into a logical
variable in the following steps.
node1()
) on the training set. Apply this dichotomizing rule on the test set and obtain the estimated regression coefficient (i.e., effect size) of the additional logical
predictor.logical
predictor.Function add_dummy_partition()
also returns an 'add_dummy'
object.
set.seed(14837); m3 = m |> add_dummy_partition(~ logKi67.quantile, times = 20L, mc.cores = 1L) |> subset(subset = p1 > .15 & p1 < .85) |> sort_by(y = abs(effsize), decreasing = TRUE) |> head(n = 2L) m3
m3[1L] |> predict(newdata = s1)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.