rpartD | R Documentation |
Dichotomize one or more predictors of a Surv, a logical, or a double response, using recursive partitioning and regression tree rpart.
rpartD(
y,
x,
check_degeneracy = TRUE,
cp = .Machine$double.eps,
maxdepth = 2L,
...
)
m_rpartD(y, X, check_degeneracy = TRUE, ...)
y |
a Surv object,
a logical vector,
or a double vector, the response |
x |
numeric vector, one predictor |
check_degeneracy |
logical scalar, whether to allow the
dichotomized value to be all- |
cp |
double scalar, complexity parameter, see rpart.control.
Default |
maxdepth |
positive integer scalar, maximum depth of any node, see rpart.control.
Default |
... |
additional parameters of rpart and/or rpart.control |
X |
numeric matrix,
a set of predictors.
Each column of |
Function rpartD dichotomizes one predictor in the following steps,
Recursive partitioning and regression tree rpart analysis is
performed for the response y
and the predictor x
.
The labels.rpart of the first node of
the rpart tree
is considered as the dichotomizing rule of the double predictor x
.
The term dichotomizing rule indicates the combination of an inequality sign
(>, >=, < and <=)
and a double cutoff threshold a
The dichotomizing rule from Step 2 is further processed, such that
<a
is regarded as \geq a
\leq a
is regarded as >a
> a
and \geq a
are regarded as is.
This step is necessary for a narrative of
greater than or greater than or equal to
the threshold a
.
A warning message is produced,
if the dichotomizing rule, applied to a new double predictor newx
, creates
an all-TRUE
or all-FALSE
result.
We do not make the algorithm stop,
as most regression models in R are capable of handling
an all-TRUE
or all-FALSE
predictor,
by returning a NA_real_
regression coefficient estimate.
Function m_rpartD dichotomizes
each predictor X[,i]
based on the response y
using function rpartD.
Applying the multiple dichotomizing rules to a new set of predictors newX
,
A warning message is produced,
if at least one of the dichotomized predictors is all-TRUE
or all-FALSE
.
We do not check if more than one of the dichotomized predictors are identical to each other. We take care of this situation in helper function coef_dichotom
Function rpartD returns a function,
with a double vector parameter newx
.
The returned value of rpartD(y,x)(newx)
is a
logical vector
with attributes
attr(,'cutoff')
double scalar, the cutoff value for newx
Function m_rpartD returns a function,
with a double matrix parameter newX
.
The argument for newX
must have
the same number of columns and the same column names as
the input matrix X
.
The returned value of m_rpartD(y,X)(newX)
is a
logical matrix
with attributes
attr(,'cutoff')
named double vector,
the cutoff values for each predictor in newX
In future integer and factor predictors will be supported.
## Dichotomize Single Predictor
data(cu.summary, package = 'rpart') # see more details from ?rpart::cu.summary
with(cu.summary, rpartD(y = Price, x = Mileage, check_degeneracy = FALSE))
(foo = with(cu.summary, rpartD(y = Price, x = Mileage)))
foo(rnorm(10, mean = 24.5))
## Dichotomize Multiple Predictors
library(survival)
data(stagec, package = 'rpart') # see more details from ?rpart::stagec
nrow(stagec) # 146
(foo = with(stagec[1:100,], m_rpartD(y = Surv(pgtime, pgstat), X = cbind(age, g2, gleason))))
foo(as.matrix(stagec[-(1:100), c('age', 'g2', 'gleason')]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.