Model-based Recursive Partitioning

Share:

Description

MOB is an algorithm for model-based recursive partitioning yielding a tree with fitted models associated with each terminal node.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
mob(formula, weights, data = list(), na.action = na.omit, model = glinearModel,
  control = mob_control(), ...)

## S3 method for class 'mob'
predict(object, newdata = NULL, type = c("response", "node"), ...)
## S3 method for class 'mob'
summary(object, node = NULL, ...)
## S3 method for class 'mob'
coef(object, node = NULL, ...)
## S3 method for class 'mob'
sctest(x, node = NULL, ...)

Arguments

formula

A symbolic description of the model to be fit. This should be of type y ~ x1 + ... + xk | z1 + ... + zl where the variables before the | are passed to the model and the variables after the | are used for partitioning.

weights

An optional vector of weights to be used in the fitting process. Only non-negative integer valued weights are allowed (default = 1).

data

A data frame containing the variables in the model.

na.action

A function which indicates what should happen when the data contain NAs, defaulting to na.omit.

model

A model of class "StatModel". See details for requirements.

control

A list with control parameters as returned by mob_control.

...

Additional arguments passed to the fit call for the model.

object, x

A fitted mob object.

newdata

A data frame with new inputs, by default the learning data is used.

type

A character string specifying whether the response should be predicted (inherited from the predict method for the model) or the ID of the associated terminal node.

node

A vector of node IDs for which the corresponding method should be applied.

Details

Model-based partitioning fits a model tree using the following algorithm:

  1. fit a model (default: a generalized linear model "StatModel" with formula y ~ x1 + ... + xk for the observations in the current node.

  2. Assess the stability of the model parameters with respect to each of the partitioning variables z1, ..., zl. If there is some overall instability, choose the variable z associated with the smallest p value for partitioning, otherwise stop. For performing the parameter instability fluctuation test, a estfun method and a weights method is needed.

  3. Search for the locally optimal split in z by minimizing the objective function of the model. Typically, this will be something like deviance or the negative logLik and can be specified in mob_control.

  4. Re-fit the model in both children, using reweight and repeat from step 2.

More details on the conceptual design of the algorithm can be found in Zeileis, Hothorn, Hornik (2008) and some illustrations are provided in vignette("MOB").

For the fitted MOB tree, several standard methods are inherited if they are available for fitted models, such as print, predict, residuals, logLik, deviance, weights, coef and summary. By default, the latter four return the result (deviance, weights, coefficients, summary) for all terminal nodes, but take a node argument that can be set to any node ID. The sctest method extracts the results of the parameter stability tests (aka structural change tests) for any given node, by default for all nodes. Some examples are given below.

Value

An object of class mob inheriting from BinaryTree-class. Every node of the tree is additionally associated with a fitted model.

References

Achim Zeileis, Torsten Hothorn, and Kurt Hornik (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514.

See Also

plot.mob, mob_control

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
set.seed(290875)

if(require("mlbench")) {

## recursive partitioning of a linear regression model
## load data
data("BostonHousing", package = "mlbench")
## and transform variables appropriately (for a linear regression)
BostonHousing$lstat <- log(BostonHousing$lstat)
BostonHousing$rm <- BostonHousing$rm^2
## as well as partitioning variables (for fluctuation testing)
BostonHousing$chas <- factor(BostonHousing$chas, levels = 0:1, 
                             labels = c("no", "yes"))
BostonHousing$rad <- factor(BostonHousing$rad, ordered = TRUE)

## partition the linear regression model medv ~ lstat + rm
## with respect to all remaining variables:
fmBH <- mob(medv ~ lstat + rm | zn + indus + chas + nox + age + 
                                dis + rad + tax + crim + b + ptratio,
  control = mob_control(minsplit = 40), data = BostonHousing, 
  model = linearModel)

## print the resulting tree
fmBH
## or better visualize it
plot(fmBH)

## extract coefficients in all terminal nodes
coef(fmBH)
## look at full summary, e.g., for node 7
summary(fmBH, node = 7)
## results of parameter stability tests for that node
sctest(fmBH, node = 7)
## -> no further significant instabilities (at 5% level)

## compute mean squared error (on training data)
mean((BostonHousing$medv - fitted(fmBH))^2)
mean(residuals(fmBH)^2)
deviance(fmBH)/sum(weights(fmBH))

## evaluate logLik and AIC
logLik(fmBH)
AIC(fmBH)
## (Note that this penalizes estimation of error variances, which
## were treated as nuisance parameters in the fitting process.)


## recursive partitioning of a logistic regression model
## load data
data("PimaIndiansDiabetes", package = "mlbench")
## partition logistic regression diabetes ~ glucose 
## wth respect to all remaining variables
fmPID <- mob(diabetes ~ glucose | pregnant + pressure + triceps + 
                                  insulin + mass + pedigree + age,
  data = PimaIndiansDiabetes, model = glinearModel, 
  family = binomial())

## fitted model
coef(fmPID)
plot(fmPID)
plot(fmPID, tp_args = list(cdplot = TRUE))
}