Modelbased Recursive Partitioning
Description
MOB is an algorithm for modelbased recursive partitioning yielding a tree with fitted models associated with each terminal node.
Usage
1 2 
Arguments
formula 
symbolic description of the model (of type

data, subset, na.action 
arguments controlling formula processing
via 
weights 
optional numeric vector of weights. By default these are
treated as case weights but the default can be changed in

offset 
optional numeric vector with an a priori known component to be
included in the model 
cluster 
optional vector (typically numeric or factor) with a
cluster ID to be passed on to the 
fit 
function. A function for fitting the model within each node. For details see below. 
control 
A list with control parameters as returned by

... 
Additional arguments passed to the 
Details
Modelbased partitioning fits a model tree using two groups of variables:
(1) The model variables which can be just a (set of) response(s) y
or
additionally include regressors x1
, ..., xk
. These are used
for estimating the model parameters.
(2) Partitioning variables z1
, ..., zl
, which are used for
recursively partitioning the data. The two groups of variables are either specified
as y ~ z1 + ... + zl
(when there are no regressors) or
y ~ x1 + ... + xk  z1 + ... + zl
(when the model part contains regressors).
Both sets of variables may in principle be overlapping.
To fit a tree model the following algorithm is used.

fit
a model to they
ory
andx
variables using the observations in the current node Assess the stability of the model parameters with respect to each of the partitioning variables
z1
, ...,zl
. If there is some overall instability, choose the variablez
associated with the smallest p value for partitioning, otherwise stop.Search for the locally optimal split in
z
by minimizing the objective function of the model. Typically, this will be something likedeviance
or the negativelogLik
.Refit the
model
in both kid subsamples and repeat from step 2.
More details on the conceptual design of the algorithm can be found in
Zeileis, Hothorn, Hornik (2008) and some illustrations are provided in
vignette("MOB")
.
For specifying the fit
function two approaches are possible:
(1) It can be a function fit(y, x = NULL, start = NULL, weights = NULL,
offset = NULL, ...)
. The arguments y
, x
, weights
, offset
will be set to the corresponding elements in the current node of the tree.
Additionally, starting values will sometimes be supplied via start
.
Of course, the fit
function can choose to ignore any arguments that are
not applicable, e.g., if the are no regressors x
in the model or if
starting values or not supported. The returned object needs to have a class
that has associated coef
, logLik
, and
estfun
methods for extracting the estimated parameters,
the maximized loglikelihood, and the empirical estimating function (i.e.,
score or gradient contributions), respectively.
(2) It can be a function fit(y, x = NULL, start = NULL, weights = NULL,
offset = NULL, ..., estfun = FALSE, object = FALSE)
. The arguments have the
same meaning as above but the returned object needs to have a different structure.
It needs to be a list with elements coefficients
(containing the estimated
parameters), objfun
(containing the minimized objective function),
estfun
(the empirical estimating functions), and object
(the
fitted model object). The elements estfun
, or object
should be
NULL
if the corresponding argument is set to FALSE
.
Internally, a function of type (2) is set up by mob()
in case a function
of type (1) is supplied. However, to save computation time, a function of type
(2) may also be specified directly.
For the fitted MOB tree, several standard methods are provided such as
print
, predict
, residuals
, logLik
, deviance
,
weights
, coef
and summary
. Some of these rely on reusing the
corresponding methods for the individual model objects in the terminal nodes.
Functions such as coef
, print
, summary
also take a
node
argument that can specify the node IDs to be queried.
Some examples are given below.
More details can be found in vignette("mob", package = "partykit")
.
An overview of the connections to other functions in the package is provided
by Hothorn and Zeileis (2015).
Value
An object of class modelparty
inheriting from party
.
The info
element of the overall party
and the individual
node
s contain various informations about the models.
References
Hothorn T, Zeileis A (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16, 3905–3909.
Zeileis A, Hothorn T, Hornik K (2008). ModelBased Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514.
See Also
mob_control
, lmtree
, glmtree
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43  if(require("mlbench")) {
## Pima Indians diabetes data
data("PimaIndiansDiabetes", package = "mlbench")
## a simple basic fitting function (of type 1) for a logistic regression
logit < function(y, x, start = NULL, weights = NULL, offset = NULL, ...) {
glm(y ~ 0 + x, family = binomial, start = start, ...)
}
## set up a logistic regression tree
pid_tree < mob(diabetes ~ glucose  pregnant + pressure + triceps + insulin +
mass + pedigree + age, data = PimaIndiansDiabetes, fit = logit)
## see lmtree() and glmtree() for interfaces with more efficient fitting functions
## print tree
print(pid_tree)
## print information about (some) nodes
print(pid_tree, node = 3:4)
## visualization
plot(pid_tree)
## coefficients and summary
coef(pid_tree)
coef(pid_tree, node = 1)
summary(pid_tree, node = 1)
## average deviance computed in different ways
mean(residuals(pid_tree)^2)
deviance(pid_tree)/sum(weights(pid_tree))
deviance(pid_tree)/nobs(pid_tree)
## loglikelihood and information criteria
logLik(pid_tree)
AIC(pid_tree)
BIC(pid_tree)
## predicted nodes
predict(pid_tree, newdata = head(PimaIndiansDiabetes, 6), type = "node")
## other types of predictions are possible using lmtree()/glmtree()
}
