Description Usage Arguments Details Value Author(s) References Examples
A simple function to create Decision Trees
1 2 3 4 
Data 
(dataframe) a data frame with regressors and response 
classCol 
(numeric or string) which column should be used as response col 
selectedCols 
(optional) (numeric or string) which columns should be treated as data(features + response) (defaults to all columns) 
tree 
which decision tree model to implement; One of the following values:

cvType 
(optional) (string) which type of crossvalidation scheme to follow  only in case of CARTCV or CARTNACV; One of the following values:

nTrainFolds 
(optional) (parameter for only kfold crossvalidation) No. of folds in which to further divide Training dataset 
ntrainTestFolds 
(optional) (parameter for only kfold crossvalidation) No. of folds for training and testing dataset 
modelTrainFolds 
= (optional) (parameter for only kfold crossvalidation) specific folds from the first train/test split (ntrainTestFolds) to use for training 
foldSep 
(numeric) (parameter for only LeaveOne_subject Out) mandatory column number for Leaveonesubject out crossvalidation. 
cvFraction 
(optional) (numeric) Fraction of data to keep for training data 
extendedResults 
(optional) (logical) Return extended results with model and other metrics 
SetSeed 
(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results; 
silent 
(optional) (logical) whether to print messages or not 
NewData 
(optional) (dataframe) New Data frame features for which the class membership is requested 
... 
(optional) additional arguments for the function 
The function implements the Decision Tree models (DT models). DT models fall under the general "Tree based methods" involving generation of a recursive binary tree (Hastie et al., 2009). In terms of input, DT models can handle both continuous and categorical variables as well as missing data. From the input data, DT models build a set of logical "if ..then" rules that permit accurate prediction of the input cases.
The function "rpart" handles the missing data by creating surrogate variables instead of removing them entirely (Therneau, & Atkinson, 1997). This could be useful in case the data contains multiple missing values.
Unlike regression methods like GLMs, Decision Trees are more flexible and can model nonlinear interactions.
model result for the input tree Results
or Test accuracy accTest
based on tree
. If extendedResults
= TRUE
outputs Test accuracy accTest
of discrimination,ConfMatrix
Confusion matrices and fit
the model
and ConfusionMatrixResults
Overall crossvalidated confusion matrix results
Atesh Koul, C'MON unit, Istituto Italiano di Tecnologia
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer Series in Statistics (2nd ed., Vol. 1). New York, NY: Springer New York.
Terry Therneau, Beth Atkinson and Brian Ripley (2015). rpart: Recursive Partitioning and Regression Trees. R package version 4.110. https://CRAN.Rproject.org/package=rpart
Therneau, T. M., & Atkinson, E. J. (1997). An introduction to recursive partitioning using the RPART routines (Vol. 61, p. 452). Mayo Foundation: Technical report.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  # generate a cart model for 10% of the data with crossvalidation
model < DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112), tree='CARTCV',cvType = "holdout")
# Output:
# Performing Decision Tree Analysis
#
# [1] "Generating crossvalidated Tree With Missing Values"
#
# Performing holdout Crossvalidation
#
# cvFraction was not specified,
# Using default value of 0.8 (cvFraction = 0.8)"
# Proportion of Test/Train Data was : 0.2470588
#
# [1] "Test holdout Accuracy is 0.62"
# holdout CART Analysis:
# cvFraction : 0.8
# Test Accuracy 0.62
# *Legend:
# cvFraction = Fraction of data to keep for training data
# Test Accuracy = Accuracy from the Testing dataset
#' # CART MOdel 
# Alternate uses:
# kfold crossvalidation with removing missing values
model < DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTNACV',cvType="folds")
# holdout crossvalidation without removing missing values
model < DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTCV',cvType = "holdout")
# kfold crossvalidation without removing missing values
model < DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTCV',cvType="folds")

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.