tree_var: Recursive Partitioning and Regression Trees

View source: R/trees.R

tree_varR Documentation

Recursive Partitioning and Regression Trees

Description

Fit and plot a rpart model for exploratory purposes using rpart and rpart.plot libraries.

Usage

tree_var(
  df,
  y,
  type = 2,
  max = 3,
  min = 20,
  cp = 0,
  ohse = TRUE,
  plot = TRUE,
  explain = TRUE,
  title = NA,
  subtitle = NULL,
  ...
)

Arguments

df

Data frame

y

Variable or Character. Name of the dependent variable or response.

type

Type of plot. Possible values:

0 Draw a split label at each split and a node label at each leaf.

1 Label all nodes, not just leaves. Similar to text.rpart's all=TRUE.

2 Default. Like 1 but draw the split labels below the node labels. Similar to the plots in the CART book.

3 Draw separate split labels for the left and right directions.

4 Like 3 but label all nodes, not just leaves. Similar to text.rpart's fancy=TRUE. See also clip.right.labs.

5 Show the split variable name in the interior nodes.

max

Integer. Maximal depth of the tree.

min

Integer. The minimum number of observations that must exist in a node in order for a split to be attempted.

cp

complexity parameter. Any split that does not decrease the overall lack of fit by a factor of cp is not attempted. For instance, with anova splitting, this means that the overall R-squared must increase by cp at each step. The main role of this parameter is to save computing time by pruning off splits that are obviously not worthwhile. Essentially,the user informs the program that any split which does not improve the fit by cp will likely be pruned off by cross-validation, and that hence the program need not pursue it.

ohse

Boolean. Auto generate One Hot Smart Encoding?

plot

Boolean. Return a plot? If not, rpart object.

explain

Boolean. Include a brief explanation on the bottom part of the plot.

title, subtitle

Character. Title and subtitle to include in plot. Set to NULL to ignore.

...

Additional parameters passed to rpart.plot().

Details

This differs from the tree function in S mainly in its handling of surrogate variables. In most details it follows Breiman et. al (1984) quite closely. R package tree provides a re-implementation of tree.

Value

(Invisible) list type 'tree_var' with plot (function), model, predictions, performance metrics, and interpret auxiliary text.

Author(s)

Stephen Milborrow, borrowing heavily from the rpart package by Terry M. Therneau and Beth Atkinson, and the R port of that package by Brian Ripley.

References

Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.

See Also

Other Exploratory: corr_cross(), corr_var(), crosstab(), df_str(), distr(), freqs_df(), freqs_list(), freqs_plot(), freqs(), lasso_vars(), missingness(), plot_cats(), plot_df(), plot_nums()

Other Visualization: distr(), freqs_df(), freqs_list(), freqs_plot(), freqs(), noPlot(), plot_chord(), plot_survey(), plot_timeline()

Examples

data(dft)
# Regression Tree
tree <- tree_var(dft, Fare, subtitle = "Titanic dataset")
tree$plot() # tree plot
tree$model # rpart model object
tree$performance # metrics
# Binary Tree
tree_var(dft, Survived_TRUE, explain = FALSE, cex = 0.8)$plot()
# Multiclass tree
tree_var(dft[, c("Pclass", "Fare", "Age")], Pclass, ohse = FALSE)$plot()

lares documentation built on Nov. 5, 2023, 1:09 a.m.