Description Usage Arguments Details Value Author(s) References See Also Examples
Pruning, crossvalidation to find the optimal pruning parameter and computing
validation set errors for tvcm
objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  ## S3 method for class 'tvcm'
prune(tree, cp = NULL, alpha = NULL, maxstep = NULL,
terminal = NULL, original = FALSE, ...)
## S3 method for class 'tvcm'
prunepath(tree, steps = 1L, ...)
## S3 method for class 'tvcm'
cvloss(object, folds = folds_control(), ...)
folds_control(type = c("kfold", "subsampling", "bootstrap"),
K = ifelse(type == "kfold", 5, 100),
prob = 0.5, weights = c("case", "freq"),
seed = NULL)
## S3 method for class 'cvloss.tvcm'
plot(x, legend = TRUE, details = TRUE, ...)
## S3 method for class 'tvcm'
oobloss(object, newdata = NULL, weights = NULL,
fun = NULL, ...)

object, tree 
an object of class 
cp 
numeric scalar. The complexity parameter to be crossvalidated resp. the penalty with which the model should be pruned. 
alpha 
numeric significance level. Represents the stopping
parameter for 
maxstep 
integer. The maximum number of steps of the algorithm. 
terminal 
a list of integer vectors with the ids of the nodes the inner nodes to be set to terminal nodes. The length of the list must be equal the number of partitions. 
original 
logical scalar. Whether pruning should be based on the trees from partitioning rather than on the current trees. 
steps 
integer vector. The iteration steps from which information should be extracted. 
folds 
a list with control arguments as produced by

type 
character string. The type of sampling scheme to be used to divide the data of the input model in a learning and a validation set. 
K 
integer scalar. The number of folds. 
weights 
for 
prob 
numeric between 0 and 1. The probability for the

seed 
an numeric scalar that defines the seed. 
x 
an object of class 
legend 
logical scalar. Whether a legend should be added. 
details 
logical scalar. Whether the foldwise validation errors should be shown. 
newdata 
a data.frame of outofbag data (including the response
variable). See also 
fun 
the loss function for the validation sets. By default, the
(possibly weighted) mean of the deviance residuals as defined by the

... 
other arguments to be passed. 
tvcglm
and tvcm
processe
treesize selection by default. The functions could be interesting for
advanced users.
The prune
function is used to collapse inner nodes of
the tree structures by the tuning parameter cp
. The aim of
pruning by cp
is to collapse inner nodes to minimize the
costcomplexity criterion
error(cp) = error(tree) + cp * complexity(tree)
where the training error error(tree) is defined by
lossfun
and complexity(tree) is defined as the total
number of coefficients times dfpar
plus the total number of
splits times dfsplit
. The function lossfun
and the
parameters dfpar
and dfsplit
are defined by the
control
argument of tvcm
, see also
tvcm_control
. By default, error(tree) is minus
two times the total likelihood of the model and complexity(tree)
the number of splits. The minimization of error(cp) is
implemented by the following iterative backwardstepwise algorithm
fit all subtree
models that collapse one inner node of the
current tree
model.
compute the percomplexity increase in the training error
dev = (error(subtree)  error(tree)) / (complexity(tree)  complexity(subtree))
for all fitted subtree
models
if any dev
< cp
then set as the tree
model
the subtree
that minimizes dev
and repeated 1 to 3,
otherwise stop.
The penalty cp
is generally unknown and is estimated adaptively
from the data. The cvloss
function implements the
crossvalidation method to do this. cvloss
repeats
for each fold the following steps
fit a new model with tvcm
based on the training
data of the fold.
prune the new model for increasing cp
. Compute for each
cp
the average validation error.
Doing so yields for each fold a sequence of values for cp
and
a sequence of average validation errors. These sequences are then
combined to a finer grid and the average validation error is averaged
correspondingly. From these two sequences we choose the cp
value that minimizes the validation error. Notice that the average
validation error is computed as the total prediction error of the
validation set divided by the sum of validation set weights. See also
the argument ooblossfun
in tvcm_control
and
the function oobloss
.
The prunepath
function can be used to backtrack the
pruning algorithm. By default, it shows the results from collapsing
inner nodes in the first iteration. The interesting iteration(s) can
be selected by the steps
argument. The output shows several
information on the performances when collapsing inner nodes. The node
labels shown in the output refer to the initial tree.
The function folds_control
is used to specify the
crossvalidation scheme, where a random 5fold crossvalidation scheme
is used by default. Alternatives are type = "subsampling"
(random draws without replacement) and type = "bootstrap"
(random
draws with replacement). For 2stage models (with randomeffects)
fitted by olmm
, the subsets are based on subjectwise
i.e. first stage sampling. For models where weights represent frequencies
of observation units (e.g., data from contingency tables), the option
weights = "freq"
should be considered. cvloss
returns an object for which a print
and a plot
generic is
provided.
oobloss
can be used to estimate the total prediction
error for validation data (the newdata
argument). By default,
the loss is defined as the sum of deviance residuals, see the return
value dev.resids
of family
resp. family.olmm
. Otherwise, the loss function can
be defined manually by the argument fun
, see the examples
below. In general the sum of deviance residual is equal the sum of the
2 loglikelihood errors. A special case is the gaussian family, where
the deviance residuals are computed as ∑_{i=1}^N w_i (y_iμ)^2,
that is, the deviance residuals ignore the term log 2πσ^2.
Therefore, the sum of deviance residuals for the gaussian model (and
possibly others) is not exactly the sum of 2 loglikelihood prediction
errors (but shifted by a constant). Another special case are models with
random effects. For models based on olmm
, the deviance
residuals are retrieved from marginal predictions (where random effects are
integrated out).
prune
returns a tvcm
object,
folds_control
returns a list of parameters for building a
crossvalidation scheme. cvloss
returns an cvloss.tvcm
object with at least the following components:
grid 
a list with values for 
oobloss 
a matrix recording the validated loss for each value in

cp.hat 
numeric scalar. The tuning parameter which minimizes the crossvalidated error. 
folds 
the used folds to extract the learning and the validation sets. 
oobloss
returns a scalar representing the total prediction
error for newdata
.
Reto Buergin
Breiman, L., J. H. Friedman, R. A. Olshen and C.J. Stone (1984). Classification and Regression Trees. New York, USA: Wadsworth.
Hastie, T., R. Tibshirani and J. Friedman (2001). The Elements of Statistical Learning (2 ed.). New York, USA: SpringerVerlag.
Buergin, R. and G. Ritschard (2017), CoefficientWise TreeBased Varying Coefficient Regression with vcrpart. Journal of Statistical Software, 80(6), 1–33.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33  ##  #
## Dummy Example:
##
## Model selection for the 'vcrpart_2' data. The example is
## merely a syntax template.
##  #
## load the data
data(vcrpart_2)
## fit the model
control < tvcm_control(maxstep = 2L, minsize = 5L, cv = FALSE)
model < tvcglm(y ~ vc(z1, z2, by = x1) + vc(z1, by = x2),
data = vcrpart_2, family = gaussian(),
control = control, subset = 1:75)
## crossvalidate 'dfsplit'
cv < cvloss(model, folds = folds_control(type = "kfold", K = 2, seed = 1))
cv
plot(cv)
## prune model with estimated 'cp'
model.p < prune(model, cp = cv$cp.hat)
## backtrack pruning
prunepath(model.p, steps = 1:3)
## outofbag error
oobloss(model, newdata = vcrpart_2[76:100,])
## use an alternative loss function
rfun < function(y, mu, wt) sum(abs(y  mu))
oobloss(model, newdata = vcrpart_2[76:100,], fun = rfun)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.