Inner and Terminal Nodes

Description

A class for representing inner and terminal nodes in trees and functions for data partitioning.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
partynode(id, split = NULL, kids = NULL, surrogates = NULL, 
    info = NULL)
kidids_node(node, data, vmatch = 1:ncol(data), 
    obs = NULL, perm = NULL)
fitted_node(node, data, vmatch = 1:ncol(data), 
    obs = 1:nrow(data), perm = NULL)
id_node(node)
split_node(node)
surrogates_node(node)
kids_node(node)
info_node(node)
formatinfo_node(node, FUN = NULL, default = "", prefix = NULL, ...)

Arguments

id

integer, a unique identifier for a node.

split

an object of class partysplit.

kids

a list of partynode objects.

surrogates

a list of partysplit objects.

info

additional information.

node

an object of class partynode.

data

a list or data.frame.

vmatch

a permutation of the variable numbers in data.

obs

a logical or integer vector indicating a subset of the observations in data.

perm

a vector of integers specifying the variables to be permuted prior before splitting (i.e., for computing permutation variable importances). The default NULL doesn't alter the data.

FUN

function for formatting the info, for default see below.

default

a character used if the info in node is NULL.

prefix

an optional prefix to be added to the returned character.

...

further arguments passed to capture.output.

Details

A node represents both inner and terminal nodes in a tree structure. Each node has a unique identifier id. A node consisting only of such an identifier (and possibly additional information in info) is a terminal node.

Inner nodes consist of a primary split (an object of class partysplit) and at least two kids (daughter nodes). Kid nodes are objects of class partynode itself, so the tree structure is defined recursively. In addition, a list of partysplit objects offering surrogate splits can be supplied. Like partysplit objects, partynode objects aren't connected to the actual data.

Function kidids_node() determines how the observations in data[obs,] are partitioned into the kid nodes and returns the number of the list element in list kids each observations belongs to (and not it's identifier). This is done by evaluating split (and possibly all surrogate splits) on data using kidids_split.

Function fitted_node() performs all splits recursively and returns the identifier id of the terminal node each observation in data[obs,] belongs to. Arguments vmatch, obs and perm are passed to kidids_split.

Function formatinfo_node() extracts the the info from node and formats it to a character vector using the following strategy: If is.null(info), the default is returned. Otherwise, FUN is applied for formatting. The default function uses as.character for atomic objects and applies capture.output to print(info) for other objects. Optionally, a prefix can be added to the computed character string.

All other functions are accessor functions for extracting information from objects of class partynode.

Value

The constructor partynode() returns an object of class partynode:

id

a unique integer identifier for a node.

split

an object of class partysplit.

kids

a list of partynode objects.

surrogates

a list of partysplit objects.

info

additional information.

kidids_split() returns an integer vector describing the partition of the observations into kid nodes by their position in list kids.

fitted_node() returns the node identifiers (id) of the terminal nodes each observation belongs to.

References

Hothorn T, Zeileis A (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16, 3905–3909.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
data("iris", package = "datasets")

## a stump defined by a binary split in Sepal.Length
stump <- partynode(id = 1L, 
    split = partysplit(which(names(iris) == "Sepal.Length"),
	breaks = 5),
    kids = lapply(2:3, partynode))

## textual representation
print(stump, data = iris)  

## list element number and node id of the two terminal nodes
table(kidids_node(stump, iris), 
    fitted_node(stump, data = iris))

## assign terminal nodes with probability 0.5
## to observations with missing `Sepal.Length'
iris_NA <- iris
iris_NA[sample(1:nrow(iris), 50), "Sepal.Length"] <- NA
table(fitted_node(stump, data = iris_NA, 
    obs = !complete.cases(iris_NA)))

## a stump defined by a primary split in `Sepal.Length'
## and a surrogate split in `Sepal.Width' which
## determines terminal nodes for observations with
## missing `Sepal.Length'
stump <- partynode(id = 1L, 
    split = partysplit(which(names(iris) == "Sepal.Length"),
	breaks = 5),
    kids = lapply(2:3, partynode),
    surrogates = list(partysplit(
	which(names(iris) == "Sepal.Width"), breaks = 3)))
f <- fitted_node(stump, data = iris_NA, 
    obs = !complete.cases(iris_NA))
tapply(iris_NA$Sepal.Width[!complete.cases(iris_NA)], f, range)