prune: Prune a tree using cross-validation

Description Usage Arguments Details Value Note See Also

View source: R/treethresh.R

Description

Extracts an optimal subtree from a tree object of the classes treethresh or wtthresh. Contrary to subtree the values of the complexity parameter C does not need to be given, but is determined using cross-validation.

Usage

1
2
3
4
## S3 method for class 'treethresh'
prune(object, v=5, sd.mult=0.5, plot=TRUE)
## S3 method for class 'wtthresh'
prune(object, v=5, sd.mult=0.5, plot=TRUE)

Arguments

object

An object of the class treethresh or wtthresh according to which thresholding is to be carried out.

v

The number of folds in the cross-validation used to determine the optimal subtree in the pruning step (see below for details).

sd.mult

The smallest subtree that is not sd.mult times the standard error worse than the best loglikelihood will be chosen as the optimal tree in the pruning step. (see below for details).

plot

If plot=TRUE a plot of the relative predicted loglikelihood estimated in the cross-validation against the complexity parameter C is produced.

...

additional arguments (see above for supported arguments).

Details

The tree grown by treethresh or wtthresh often yields too many partitions leading to an overfit. The resulting tree has to be 'pruned', i.e. the branches corresponding to the least important regions have to be 'snipped off'.

As the TreeThresh model is a special case of a classification and regression tree, there exists a sequence of nested subtrees (i.e. a sequence of nested partitions) that maximises the regularised loglikelihood

l + alpha * #partitions.

The parameter alpha controls the complexity of the resulting partition. For alpha=0 no pruning is carried out. If a large enough alpha is chosen, only the root node of the tree is retained, i.e. no partitioning is done. Denote this value of alpha by alpha_0. The complexity parameter can thus be rescaled to

C = alpha / alpha_0

yielding a complexity parameter ranging from 0 (no pruning) to 1 (only retain the root node).

The optimal value of the complexity parameter C (or, equivalently, alpha) depends on the problem at hand and thus has to be chosen carefully. prune estimates the optimal complexity parameter C by a v-fold cross-validation. If sd.mult=0 the value of C that yields the highest predictive loglikelihood in the cross-validation is used to prune the tree object. If sd.mult is not 0 the largest C that is not sd.mult standard errors worse than the best C is used.

Value

prune returns an object of the class treethresh or wtthresh that contains a tree pruned at value C (see the function prune for details on the pruning process).

Note

For an example of the use of prune, see coefficients.

See Also

treethresh, wtthresh, get.t, prune


treethresh documentation built on May 1, 2019, 11:16 p.m.