Modelbased Recursive Partitioning
Description
MOB is an algorithm for modelbased recursive partitioning yielding a tree with fitted models associated with each terminal node.
Usage
1 2 3 4 5 6 7 8 9 10 11  mob(formula, weights, data = list(), na.action = na.omit, model = glinearModel,
control = mob_control(), ...)
## S3 method for class 'mob'
predict(object, newdata = NULL, type = c("response", "node"), ...)
## S3 method for class 'mob'
summary(object, node = NULL, ...)
## S3 method for class 'mob'
coef(object, node = NULL, ...)
## S3 method for class 'mob'
sctest(x, node = NULL, ...)

Arguments
formula 
A symbolic description of the model to be fit. This
should be of type 
weights 
An optional vector of weights to be used in the fitting process. Only nonnegative integer valued weights are allowed (default = 1). 
data 
A data frame containing the variables in the model. 
na.action 
A function which indicates what should happen when the data
contain 
model 
A model of class 
control 
A list with control parameters as returned by

... 
Additional arguments passed to the 
object, x 
A fitted 
newdata 
A data frame with new inputs, by default the learning data is used. 
type 
A character string specifying whether the response should be
predicted (inherited from the 
node 
A vector of node IDs for which the corresponding method should be applied. 
Details
Modelbased partitioning fits a model tree using the following algorithm:

fit
amodel
(default: a generalized linear model"StatModel"
with formulay ~ x1 + ... + xk
for the observations in the current node. Assess the stability of the model parameters with respect to each of the partitioning variables
z1
, ...,zl
. If there is some overall instability, choose the variablez
associated with the smallest p value for partitioning, otherwise stop. For performing the parameter instability fluctuation test, aestfun
method and aweights
method is needed.Search for the locally optimal split in
z
by minimizing the objective function of themodel
. Typically, this will be something likedeviance
or the negativelogLik
and can be specified inmob_control
.Refit the
model
in both children, usingreweight
and repeat from step 2.
More details on the conceptual design of the algorithm can be found in
Zeileis, Hothorn, Hornik (2008) and some illustrations are provided in
vignette("MOB")
.
For the fitted MOB tree, several standard methods are inherited if they are
available for fitted model
s, such as print
, predict
,
residuals
, logLik
, deviance
, weights
, coef
and
summary
. By default, the latter four return the result (deviance, weights,
coefficients, summary) for all terminal nodes, but take a node
argument
that can be set to any node ID. The sctest
method extracts the results
of the parameter stability tests (aka structural change tests) for any given
node, by default for all nodes. Some examples are given below.
Value
An object of class mob
inheriting from BinaryTreeclass
.
Every node of the tree is additionally associated with a fitted model.
References
Achim Zeileis, Torsten Hothorn, and Kurt Hornik (2008). ModelBased Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514.
See Also
plot.mob
, mob_control
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62  set.seed(290875)
if(require("mlbench")) {
## recursive partitioning of a linear regression model
## load data
data("BostonHousing", package = "mlbench")
## and transform variables appropriately (for a linear regression)
BostonHousing$lstat < log(BostonHousing$lstat)
BostonHousing$rm < BostonHousing$rm^2
## as well as partitioning variables (for fluctuation testing)
BostonHousing$chas < factor(BostonHousing$chas, levels = 0:1,
labels = c("no", "yes"))
BostonHousing$rad < factor(BostonHousing$rad, ordered = TRUE)
## partition the linear regression model medv ~ lstat + rm
## with respect to all remaining variables:
fmBH < mob(medv ~ lstat + rm  zn + indus + chas + nox + age +
dis + rad + tax + crim + b + ptratio,
control = mob_control(minsplit = 40), data = BostonHousing,
model = linearModel)
## print the resulting tree
fmBH
## or better visualize it
plot(fmBH)
## extract coefficients in all terminal nodes
coef(fmBH)
## look at full summary, e.g., for node 7
summary(fmBH, node = 7)
## results of parameter stability tests for that node
sctest(fmBH, node = 7)
## > no further significant instabilities (at 5% level)
## compute mean squared error (on training data)
mean((BostonHousing$medv  fitted(fmBH))^2)
mean(residuals(fmBH)^2)
deviance(fmBH)/sum(weights(fmBH))
## evaluate logLik and AIC
logLik(fmBH)
AIC(fmBH)
## (Note that this penalizes estimation of error variances, which
## were treated as nuisance parameters in the fitting process.)
## recursive partitioning of a logistic regression model
## load data
data("PimaIndiansDiabetes", package = "mlbench")
## partition logistic regression diabetes ~ glucose
## wth respect to all remaining variables
fmPID < mob(diabetes ~ glucose  pregnant + pressure + triceps +
insulin + mass + pedigree + age,
data = PimaIndiansDiabetes, model = glinearModel,
family = binomial())
## fitted model
coef(fmPID)
plot(fmPID)
plot(fmPID, tp_args = list(cdplot = TRUE))
}
