Class "BinaryTree"

Description

A class for representing binary trees.

Objects from the Class

Objects can be created by calls of the form new("BinaryTree", ...). The most important slot is tree, a (recursive) list with elements

nodeID

an integer giving the number of the node, starting with 1 in the root node.

weights

the case weights (of the learning sample) corresponding to this node.

criterion

a list with test statistics and p-values for each partial hypothesis.

terminal

a logical specifying if this is a terminal node.

psplit

primary split: a list with elements variableID (the number of the input variable splitted), ordered (a logical whether the input variable is ordered), splitpoint (the cutpoint or set of levels to the left), splitstatistics saves the process of standardized two-sample statistics the split point estimation is based on. The logical toleft determines if observations go left or right down the tree. For nominal splits, the slot table is a vector being greater zero if the corresponding level is available in the corresponding node.

ssplits

a list of surrogate splits, each with the same elements as psplit.

prediction

the prediction of the node: the mean for numeric responses and the conditional class probabilities for nominal or ordered respones. For censored responses, this is the mean of the logrank scores and useless as such.

left

a list representing the left daughter node.

right

a list representing the right daugther node.

Please note that this data structure may be subject to change in future releases of the package.

Slots

data:

an object of class "ModelEnv".

responses:

an object of class "VariableFrame" storing the values of the response variable(s).

cond_distr_response:

a function computing the conditional distribution of the response.

predict_response:

a function for computing predictions.

tree:

a recursive list representing the tree. See above.

where:

an integer vector of length n (number of observations in the learning sample) giving the number of the terminal node the corresponding observations is element of.

prediction_weights:

a function for extracting weights from terminal nodes.

get_where:

a function for determining the number of terminal nodes observations fall into.

update:

a function for updating weights.

Extends

Class "BinaryTreePartition", directly.

Methods

response(object, ...):

extract the response variables the tree was fitted to.

treeresponse(object, newdata = NULL, ...):

compute statistics for the conditional distribution of the response as modelled by the tree. For regression problems, this is just the mean. For nominal or ordered responses, estimated conditional class probabilities are returned. Kaplan-Meier curves are computed for censored responses. Note that a list with one element for each observation is returned.

Predict(object, newdata = NULL, ...):

compute predictions.

weights(object, newdata = NULL, ...):

extract the weight vector from terminal nodes each element of the learning sample is element of (newdata = NULL) and for new observations, respectively.

where(object, newdata = NULL, ...):

extract the number of the terminal nodes each element of the learning sample is element of (newdata = NULL) and for new observations, respectively.

nodes(object, where, ...):

extract the nodes with given number (where).

plot(x, ...):

a plot method for BinaryTree objects, see plot.BinaryTree.

print(x, ...):

a print method for BinaryTree objects.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
  set.seed(290875)

  airq <- subset(airquality, !is.na(Ozone))
  airct <- ctree(Ozone ~ ., data = airq,   
                 controls = ctree_control(maxsurrogate = 3))

  ### distribution of responses in the terminal nodes
  plot(airq$Ozone ~ as.factor(where(airct)))

  ### get all terminal nodes from the tree
  nodes(airct, unique(where(airct)))

  ### extract weights and compute predictions
  pmean <- sapply(weights(airct), function(w) weighted.mean(airq$Ozone, w))

  ### the same as
  drop(Predict(airct))

  ### or
  unlist(treeresponse(airct))

  ### don't use the mean but the median as prediction in each terminal node
  pmedian <- sapply(weights(airct), function(w) 
                 median(airq$Ozone[rep(1:nrow(airq), w)]))

  plot(airq$Ozone, pmean, col = "red")
  points(airq$Ozone, pmedian, col = "blue")