node1: Dichotomize via 1st Node of Recursive Partitioning

View source: R/node1.R

node1R Documentation

Dichotomize via 1st Node of Recursive Partitioning

Description

Dichotomize one or more predictors of a Surv, a logical, or a double response, using recursive partitioning and regression tree rpart.

Usage

node1(x, check_degeneracy = TRUE, ...)

Arguments

x

a rpart object

check_degeneracy

logical scalar, whether to allow the dichotomized value to be all-FALSE or all-TRUE (i.e., degenerate) for any one of the predictors. Default TRUE to produce a warning message for degeneracy.

...

additional parameters of rpart and/or rpart.control

Details

Function node1() dichotomizes one predictor in the following steps,

  1. Recursive partitioning and regression tree rpart analysis is performed for the response y and the predictor x.

  2. The labels.rpart of the first node of the rpart tree is considered as the dichotomizing rule of the double predictor x. The term dichotomizing rule indicates the combination of an inequality sign (>, >=, < and <=) and a double cutoff threshold a

  3. The dichotomizing rule from Step 2 is further processed, such that

    • <a is regarded as \geq a

    • \leq a is regarded as >a

    • > a and \geq a are regarded as is.

    This step is necessary for a narrative of greater than or greater than or equal to the threshold a.

  4. A warning message is produced, if the dichotomizing rule, applied to a new double predictor newx, creates an all-TRUE or all-FALSE result. We do not make the algorithm stop, as most regression models in R are capable of handling an all-TRUE or all-FALSE predictor, by returning a NA_real_ regression coefficient estimate.

Value

Function node1() returns an object of class 'node1', which is a function with one parameter newx taking a double vector.

Note

In future integer and factor predictors will be supported.

Function rpart is quite slow.

Examples

library(rpart)
(r = rpart(Price ~ Mileage, data = cu.summary, control = rpart.control(maxdepth = 2L)))
(foo = r |> node1())
get_cutoff(foo)
labels(foo)
rnorm(6L, mean = 24.5) |> foo()

maxEff documentation built on April 12, 2025, 2:11 a.m.