Classification trees for ordinal responses

Share:

Description

This function allows the user to build classification trees for ordinal responses within the CART framework. The trees are grown using the Generalized Gini impurity function, where the misclassification costs are given by the absolute or squared differences in scores assigned to the categories of the response. Pruning is based on the total misclassification rate or on the total misclassification cost.

Usage

1
2
3
4
rpartScore(formula, data, weights, subset, na.action = na.rpart, 
 	split = "abs", prune = "mc", 
 	model = FALSE, x = FALSE, y = TRUE, 
 	control, ...)

Arguments

formula

a formula, as in the lm function.

data

an optional data frame in which to interpret the variables named in the formula.

weights

optional case weights.

subset

optional expression saying that only a subset of the rows of the data should be used in the fit.

na.action

The default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing.

split

One of "abs" or "quad".

prune

One of "mc" or "mr".

model

if logical: keep a copy of the model frame in the result? If the input value for model is a model frame (likely from an earlier call to the rpart or rpartScore function), then this frame is used rather than constructing new data.

x

keep a copy of the x matrix in the result.

y

keep a copy of the dependent variable in the result. If missing and model is supplied this defaults to FALSE.

control

options that control details of the rpart algorithm.

...

arguments to rpart.control may also be specified in the call to rpartScore. They are checked against the list of valid arguments.

Details

The use of this function is almost the same as the rpart function.
It is assumed that a set of (not necessarily linear) numerical scores has been assigned to the ordered categories of the response.
The main difference with respect to the rpart function is the presence of two arguments (split and prune) instead of the method argument.
The argument split controls the splitting function used to grow the classification tree, by setting the misclassification costs in the generalized Gini impurity function equal to the absolute ("abs" - is the default option) or to the squared ("quad") differences in scores.
The argument prune allows the user to select the prediction performance measure used to prune the classification tree, and can take two values: "mr" (total misclassification rate) or "mc" (total misclassification cost - is the default option).

Value

An object of class rpart, a superset of class tree.

Author(s)

Giuliano Galimberti, Gabriele Soffritti, Matteo Di Maso

References

Breiman L., Friedman J.H., Olshen R.A., Stone C.J. 1984 Classification and Regression Trees. Wadsworth International.

Galimberti G., Soffritti G., Di Maso M. 2012 Classification Trees for Ordinal Responses in R: The rpartScore Package. Journal of Statistical Software, 47(10), 1-25. URL http://www.jstatsoft.org/v47/i10/.

Piccarreta R. 2008 Classication Trees for Ordinal Variables. Computational Statistics, 23, 407-427.

Therneau T.M., Atkinson E.J. 1997 An Introduction to Recursive Partitioning Using rpart Routines. Technical Report 61, Section of Biostatistics, Mayo Clinic, Rochester. URL http://www.mayo.edu/hsr/techrpt/61.pdf.

See Also

rpart,rpart.control, rpart.object,summary.rpart, print.rpart

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
data("birthwt",package="MASS")

birthwt$Category.s <- ifelse(birthwt$bwt <= 2500, 3,
 	ifelse(birthwt$bwt <= 3000, 2,
 	ifelse(birthwt$bwt <= 3500, 1, 0)))

T.abs.mc <- rpartScore(Category.s ~ age + lwt + race + smoke +
 	ptl + ht + ui + ftv, data = birthwt)

plotcp(T.abs.mc)

T.abs.mc.pruned<-prune(T.abs.mc,cp=0.02)

plot(T.abs.mc.pruned)

text(T.abs.mc.pruned)
 
T.abs.mr <- rpartScore(Category.s ~ age + lwt + race + smoke +
 	ptl + ht + ui + ftv, data = birthwt, prune = "mr")

T.quad.mc <- rpartScore(Category.s ~ age + lwt + race + smoke + 
 	ptl + ht + ui + ftv, split = "quad", data = birthwt)

T.quad.mr <- rpartScore(Category.s ~ age + lwt + race + smoke + ptl + ht + 
 	ui + ftv, split = "quad", prune = "mr", data = birthwt)