PPforest: Projection Pursuit Random Forest
In PPforest: Projection Pursuit Classification Forest

PPforest

R Documentation

Projection Pursuit Random Forest

Description

PPforest implements a random forest using projection pursuit trees algorithm (based on PPtreeViz package).

Usage

PPforest(data, class, std = TRUE, size.tr, m, PPmethod, size.p,
 lambda = .1, parallel = FALSE, cores = 2, rule = 1)

Arguments

`data`	Data frame with the complete data set.
`class`	A character with the name of the class variable.
`std`	if TRUE standardize the data set, needed to compute global importance measure.
`size.tr`	is the size proportion of the training if we want to split the data in training and test.
`m`	is the number of bootstrap replicates, this corresponds with the number of trees to grow. To ensure that each observation is predicted a few times we have to select this number no too small. `m = 500` is by default.
`PPmethod`	is the projection pursuit index to optimize in each classification tree. The options are `LDA` and `PDA`, linear discriminant and penalized linear discriminant. By default it is `LDA`.
`size.p`	proportion of variables randomly sampled in each split.
`lambda`	penalty parameter in PDA index and is between 0 to 1 . If `lambda = 0`, no penalty parameter is added and the PDA index is the same as LDA index. If `lambda = 1` all variables are treated as uncorrelated. The default value is `lambda = 0.1`.
`parallel`	logical condition, if it is TRUE then parallelize the function
`cores`	number of cores used in the parallelization
`rule`	split rule 1: mean of two group means 2: weighted mean of two group means - weight with group size 3: weighted mean of two group means - weight with group sd 4: weighted mean of two group means - weight with group se 5: mean of two group medians 6: weighted mean of two group medians - weight with group size 7: weighted mean of two group median - weight with group IQR 8: weighted mean of two group median - weight with group IQR and size

Value

An object of class PPforest with components.

`prediction.training`	predicted values for training data set.
`training.error`	error of the training data set.
`prediction.test`	predicted values for the test data set if `testap = TRUE`(default).
`error.test`	error of the test data set if `testap = TRUE`(default).
`oob.error.forest`	out of bag error in the forest.
`oob.error.tree`	out of bag error for each tree in the forest.
`boot.samp`	information of bootrap samples.
`output.trees`	output from a `trees_pp` for each bootrap sample.
`proximity`	Proximity matrix, if two cases are classified in the same terminal node then the proximity matrix is increased by one in `PPforest` there are one terminal node per class.
`votes`	a matrix with one row for each input data point and one column for each class, giving the fraction of (OOB) votes from the `PPforest`.
`n.tree`	number of trees grown in `PPforest`.
`n.var`	number of predictor variables selected to use for spliting at each node.
`type`	classification.
`confusion`	confusion matrix of the prediction (based on OOB data).
`call`	the original call to `PPforest`.
`train`	is the training data based on `size.tr` sample proportion
`test`	is the test data based on `1-size.tr` sample proportion

References

Natalia da Silva, Dianne Cook & Eun-Kyung Lee (2021) A Projection Pursuit Forest Algorithm for Supervised Classification, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2020.1870480

Examples

#crab example with all the observations used as training

pprf.crab <- PPforest(data = crab, class = 'Type',
 std = FALSE, size.tr = 1, m = 200, size.p = .5, 
 PPmethod = 'LDA' , parallel = TRUE, cores = 2, rule=1)
pprf.crab

PPforest documentation built on Sept. 10, 2022, 1:05 a.m.