| rpartD | R Documentation |
Dichotomize one or more predictors of a Surv, a logical, or a double response, using recursive partitioning and regression tree rpart.
rpartD(
y,
x,
check_degeneracy = TRUE,
cp = .Machine$double.eps,
maxdepth = 2L,
...
)
m_rpartD(y, X, check_degeneracy = TRUE, ...)
y |
a Surv object,
a logical vector,
or a double vector, the response |
x |
numeric vector, one predictor |
check_degeneracy |
logical scalar, whether to allow the
dichotomized value to be all- |
cp |
double scalar, complexity parameter, see rpart.control.
Default |
maxdepth |
positive integer scalar, maximum depth of any node, see rpart.control.
Default |
... |
additional parameters of rpart and/or rpart.control |
X |
numeric matrix,
a set of predictors.
Each column of |
Function rpartD dichotomizes one predictor in the following steps,
Recursive partitioning and regression tree rpart analysis is
performed for the response y and the predictor x.
The labels.rpart of the first node of
the rpart tree
is considered as the dichotomizing rule of the double predictor x.
The term dichotomizing rule indicates the combination of an inequality sign
(>, >=, < and <=)
and a double cutoff threshold a
The dichotomizing rule from Step 2 is further processed, such that
<a is regarded as \geq a
\leq a is regarded as >a
> a and \geq a are regarded as is.
This step is necessary for a narrative of
greater than or greater than or equal to
the threshold a.
A warning message is produced,
if the dichotomizing rule, applied to a new double predictor newx, creates
an all-TRUE or all-FALSE result.
We do not make the algorithm stop,
as most regression models in R are capable of handling
an all-TRUE or all-FALSE predictor,
by returning a NA_real_ regression coefficient estimate.
Function m_rpartD dichotomizes
each predictor X[,i] based on the response y
using function rpartD.
Applying the multiple dichotomizing rules to a new set of predictors newX,
A warning message is produced,
if at least one of the dichotomized predictors is all-TRUE or all-FALSE.
We do not check if more than one of the dichotomized predictors are identical to each other. We take care of this situation in helper function coef_dichotom
Function rpartD returns a function,
with a double vector parameter newx.
The returned value of rpartD(y,x)(newx) is a
logical vector
with attributes
attr(,'cutoff')double scalar, the cutoff value for newx
Function m_rpartD returns a function,
with a double matrix parameter newX.
The argument for newX must have
the same number of columns and the same column names as
the input matrix X.
The returned value of m_rpartD(y,X)(newX) is a
logical matrix
with attributes
attr(,'cutoff')named double vector,
the cutoff values for each predictor in newX
In future integer and factor predictors will be supported.
## Dichotomize Single Predictor
data(cu.summary, package = 'rpart') # see more details from ?rpart::cu.summary
with(cu.summary, rpartD(y = Price, x = Mileage, check_degeneracy = FALSE))
(foo = with(cu.summary, rpartD(y = Price, x = Mileage)))
foo(rnorm(10, mean = 24.5))
## Dichotomize Multiple Predictors
library(survival)
data(stagec, package = 'rpart') # see more details from ?rpart::stagec
nrow(stagec) # 146
(foo = with(stagec[1:100,], m_rpartD(y = Surv(pgtime, pgstat), X = cbind(age, g2, gleason))))
foo(as.matrix(stagec[-(1:100), c('age', 'g2', 'gleason')]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.