rpart_utils: 'rpart' utilities

rpart_utilsR Documentation

rpart utilities

Description

Utilities for the rpart package:

rpart_parent returns all parent nodes of node, i.e., the path from node 1 to node.

rpart_subset and rpart_subset2 (in examples) return a subset of the data used in rpart for any intermediate or terminal node.

rpart_nodes returns the terminal node label for each observation in the original data frame used for tree.

Usage

rpart_parent(node = 1L)

rpart_subset(tree, node = 1L)

rpart_nodes(tree, node_labels = FALSE, droplevels = TRUE)

Arguments

node

an integer representing the node number

tree

an object returned from rpart

node_labels

a vector of labels having the same length as the number of terminal nodes or total nodes

droplevels

logical; if TRUE, only node labels with at least one observation are used (i.e., only terminal node labels are used)

Value

rpart_parent returns a vector representing the path from the root to node.

rpart_subset returns the data frame of observations in node. For any tree, the possibilities

rpart_nodes returns a factor variable

See Also

https://stackoverflow.com/q/36086990/2994949

https://stackoverflow.com/q/36748531/2994949

Examples

rpart_parent(116)
rpart_parent(29)

## Not run: 
library('rpart')
fit <- rpart(Kyphosis ~ Age + Number + Start, kyphosis, minsplit = 5)

## children nodes should have identical paths
identical(
  head(rpart_parent(28), -1L),
  head(rpart_parent(29), -1L)
)

## terminal nodes should combine to original data
nodes <- as.integer(rownames(fit$frame[fit$frame$var %in% '<leaf>', ]))
sum(sapply(nodes, function(x) nrow(rpart_subset(fit, x))))
nrow(kyphosis)

## all nodes
nodes <- as.integer(rownames(fit$frame))
sapply(nodes, function(x) nrow(rpart_subset(fit, x)))


rpart_subset2 <- function(tree, node = 1L) {
  require('partykit')
  ptree <- as.party(tree)
  ptree$data <- model.frame(eval(tree$call$data, parent.frame(1L)))
  ## retain transformed variables but drop those not in formula
  ## http://stackoverflow.com/a/36816883/2994949
  # ptree$data <- model.frame(tree)
  data_party(ptree, node)[, seq_along(ptree$data)]
}

## note differences in node labels in party vs rpart
dim(rpart_subset(fit, 4))
dim(rpart_subset2(fit, 3))


rpart_nodes(fit)
rpart_nodes(fit, TRUE)

table(rpart_nodes(fit, letters[1:10]),
      rpart_nodes(fit, letters[1:19]))

## subset an rpart object by node id which should only include
## observations found in children of the node id(s) selected
identical(kyphosis, rpart_subset(fit, unique(rpart_nodes(fit))))

kyphosis$node <- rpart_nodes(fit)
rpart_subset(fit, 14:15)

## End(Not run)


raredd/rawr documentation built on April 29, 2024, 10:29 a.m.