maxdepth_sampler: Sampling function generator for specifying varying maximum...

View source: R/pre.R

maxdepth_samplerR Documentation

Sampling function generator for specifying varying maximum tree depth in a prediction rule ensemble (pre)

Description

maxdepth_sampler generates a random sampling function, governed by a pre-specified average tree depth.

Usage

maxdepth_sampler(av.no.term.nodes = 4L, av.tree.depth = NULL)

Arguments

av.no.term.nodes

integer of length one. Specifies the average number of terminal nodes in trees used for rule inducation.

av.tree.depth

integer of length one. Specifies the average maximum tree depth in trees used for rule induction.

Details

The original RuleFit implementation varying tree sizes for rule induction. Furthermore, it defined tree size in terms of the number of terminal nodes. In contrast, function pre defines the maximum tree size in terms of a (constant) tree depth. Function maxdepth_sampler allows for mimicing the behavior of the orignal RuleFit implementation. In effect, the maximum tree depth is sampled from an exponential distribution with learning rate 1/(\bar{L}-2), where \bar{L} \ge 2 represents the average number of terminal nodes for trees in the ensemble. See Friedman & Popescu (2008, section 3.3).

Value

Returns a random sampling function with single argument ntrees, which can be supplied to the maxdepth argument of function pre to specify varying tree depths.

References

Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.

See Also

pre

Examples

## RuleFit default is max. 4 terminal nodes, on average:
func1 <- maxdepth_sampler()
set.seed(42)
func1(10)
mean(func1(1000))

## Max. 16 terminal nodes, on average (equals average maxdepth of 4):
func2 <- maxdepth_sampler(av.no.term.nodes = 16L)
set.seed(42)
func2(10)
mean(func2(1000))

## Max. tree depth of 3, on average:
func3 <- maxdepth_sampler(av.tree.depth = 3)
set.seed(42)
func3(10)
mean(func3(1000))

## Max. 2 of terminal nodes, on average (always yields maxdepth of 1):
func4 <- maxdepth_sampler(av.no.term.nodes = 2L)
set.seed(42)
func4(10)
mean(func4(1000))

## Create rule ensemble with varying maxdepth:
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),],
                maxdepth = func1)
airq.ens

pre documentation built on May 29, 2024, 5:10 a.m.