tree: Fit a Classification or Regression Tree

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

A tree is grown by binary recursive partitioning using the response in the specified formula and choosing splits from the terms of the right-hand-side.

Usage

1
2
3
4
5
tree(formula, data, weights, subset,
     na.action = na.pass, control = tree.control(nobs, ...),
     method = "recursive.partition",
     split = c("deviance", "gini"),
     model = FALSE, x = FALSE, y = TRUE, wts = TRUE, ...)

Arguments

formula

A formula expression. The left-hand-side (response) should be either a numerical vector when a regression tree will be fitted or a factor, when a classification tree is produced. The right-hand-side should be a series of numeric or factor variables separated by +; there should be no interaction terms. Both . and - are allowed: regression trees can have offset terms.

data

A data frame in which to preferentially interpret formula, weights and subset.

weights

Vector of non-negative observational weights; fractional weights are allowed.

subset

An expression specifying the subset of cases to be used.

na.action

A function to filter missing data from the model frame. The default is na.pass (to do nothing) as tree handles missing values (by dropping them down the tree as far as possible).

control

A list as returned by tree.control.

method

character string giving the method to use. The only other useful value is "model.frame".

split

Splitting criterion to use.

model

If this argument is itself a model frame, then the formula and data arguments are ignored, and model is used to define the model. If the argument is logical and true, the model frame is stored as component model in the result.

x

logical. If true, the matrix of variables for each case is returned.

y

logical. If true, the response variable is returned.

wts

logical. If true, the weights are returned.

...

Additional arguments that are passed to tree.control. Normally used for mincut, minsize or mindev.

Details

A tree is grown by binary recursive partitioning using the response in the specified formula and choosing splits from the terms of the right-hand-side. Numeric variables are divided into X < a and X > a; the levels of an unordered factor are divided into two non-empty groups. The split which maximizes the reduction in impurity is chosen, the data set split and the process repeated. Splitting continues until the terminal nodes are too small or too few to be split.

Tree growth is limited to a depth of 31 by the use of integers to label nodes.

Factor predictor variables can have up to 32 levels. This limit is imposed for ease of labelling, but since their use in a classification tree with three or more levels in a response involves a search over 2^(k-1) - 1 groupings for k levels, the practical limit is much less.

Value

The value is an object of class "tree" which has components

frame

A data frame with a row for each node, and row.names giving the node numbers. The columns include var, the variable used at the split (or "<leaf>" for a terminal node), n, the (weighted) number of cases reaching that node, dev the deviance of the node, yval, the fitted value at the node (the mean for regression trees, a majority class for classification trees) and split, a two-column matrix of the labels for the left and right splits at the node. Classification trees also have yprob, a matrix of fitted probabilities for each response level.

where

An integer vector giving the row number of the frame detailing the node to which each case is assigned.

terms

The terms of the formula.

call

The matched call to Tree.

model

If model = TRUE, the model frame.

x

If x = TRUE, the model matrix.

y

If y = TRUE, the response.

wts

If wts = TRUE, the weights.

and attributes xlevels and, for classification trees, ylevels.

A tree with no splits is of class "singlenode" which inherits from class "tree".

Author(s)

B. D. Ripley

References

Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge. Chapter 7.

See Also

tree.control, prune.tree, predict.tree, snip.tree

Examples

1
2
3
4
5
6
7
8
9
data(cpus, package="MASS")
cpus.ltr <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, cpus)
cpus.ltr
summary(cpus.ltr)
plot(cpus.ltr);  text(cpus.ltr)

ir.tr <- tree(Species ~., iris)
ir.tr
summary(ir.tr)

Example output

node), split, n, deviance, yval
      * denotes terminal node

 1) root 209 43.12000 1.753  
   2) cach < 27 143 11.79000 1.525  
     4) mmax < 6100 78  3.89400 1.375  
       8) mmax < 1750 12  0.78430 1.089 *
       9) mmax > 1750 66  1.94900 1.427 *
     5) mmax > 6100 65  4.04500 1.704  
      10) syct < 360 58  2.50100 1.756  
        20) chmin < 5.5 46  1.22600 1.699 *
        21) chmin > 5.5 12  0.55070 1.974 *
      11) syct > 360 7  0.12910 1.280 *
   3) cach > 27 66  7.64300 2.249  
     6) mmax < 28000 41  2.34100 2.062  
      12) cach < 96.5 34  1.59200 2.008  
        24) mmax < 11240 14  0.42460 1.827 *
        25) mmax > 11240 20  0.38340 2.135 *
      13) cach > 96.5 7  0.17170 2.324 *
     7) mmax > 28000 25  1.52300 2.555  
      14) cach < 56 7  0.06929 2.268 *
      15) cach > 56 18  0.65350 2.667 *

Regression tree:
tree(formula = log10(perf) ~ syct + mmin + mmax + cach + chmin + 
    chmax, data = cpus)
Variables actually used in tree construction:
[1] "cach"  "mmax"  "syct"  "chmin"
Number of terminal nodes:  10 
Residual mean deviance:  0.03187 = 6.342 / 199 
Distribution of residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.4945000 -0.1191000  0.0003571  0.0000000  0.1141000  0.4680000 
node), split, n, deviance, yval, (yprob)
      * denotes terminal node

 1) root 150 329.600 setosa ( 0.33333 0.33333 0.33333 )  
   2) Petal.Length < 2.45 50   0.000 setosa ( 1.00000 0.00000 0.00000 ) *
   3) Petal.Length > 2.45 100 138.600 versicolor ( 0.00000 0.50000 0.50000 )  
     6) Petal.Width < 1.75 54  33.320 versicolor ( 0.00000 0.90741 0.09259 )  
      12) Petal.Length < 4.95 48   9.721 versicolor ( 0.00000 0.97917 0.02083 )  
        24) Sepal.Length < 5.15 5   5.004 versicolor ( 0.00000 0.80000 0.20000 ) *
        25) Sepal.Length > 5.15 43   0.000 versicolor ( 0.00000 1.00000 0.00000 ) *
      13) Petal.Length > 4.95 6   7.638 virginica ( 0.00000 0.33333 0.66667 ) *
     7) Petal.Width > 1.75 46   9.635 virginica ( 0.00000 0.02174 0.97826 )  
      14) Petal.Length < 4.95 6   5.407 virginica ( 0.00000 0.16667 0.83333 ) *
      15) Petal.Length > 4.95 40   0.000 virginica ( 0.00000 0.00000 1.00000 ) *

Classification tree:
tree(formula = Species ~ ., data = iris)
Variables actually used in tree construction:
[1] "Petal.Length" "Petal.Width"  "Sepal.Length"
Number of terminal nodes:  6 
Residual mean deviance:  0.1253 = 18.05 / 144 
Misclassification error rate: 0.02667 = 4 / 150 

tree documentation built on May 2, 2019, 9:24 a.m.

Related to tree in tree...