TreeModelsAllSteps: Data Partition and Tree-based Model Training

View source: R/TreeModelsAllSteps.R

TreeModelsAllStepsR Documentation

Data Partition and Tree-based Model Training

Description

Data Partition and Tree-based Model Training

Usage

TreeModelsAllSteps(
  data = NULL,
  proportion = 0.7,
  seed = 2022,
  methodlist = c("dt", "rf", "gbm"),
  iternumber = 10,
  dt.gridsearch = NULL,
  rf.gridsearch = NULL,
  gbm.gridsearch = NULL,
  checkprogress = FALSE
)

Arguments

data

A data.frame that contains the study’s features and the outcome variable. Please name the outcome variable as "perf".

proportion

A numeric value for the proportion of data to be put into model training. Default is set to 0.7.

seed

A numeric value for set.seed. It is set to be 2022 by default.

methodlist

A list of the tree-based methods to model. The default is methodlist = c("dt", "rf", "gbm").

iternumber

A numeric value for the number of resampling iterations/number of folds for the cross-validation scheme.

dt.gridsearch

A data.frame of the tuning grid, which allows for specifying parameters for decision tree model.

rf.gridsearch

A data.frame of the tuning grid, which allows for specifying parameters for random forest model.

gbm.gridsearch

A data.frame of the tuning grid, which allows for specifying parameters for gradient boosting model.

checkprogress

Logical. Print the modeling progress if it is TRUE. The default is FALSE.

Details

This function performs all the steps of a predictive analysis. First, the data is partitioned in the training and testing datasets using a stratified selection by the outcome variable as performed by the createDataPartition function from the caret package. Then, the selected classifiers are used for modeling the training dataset under a cross-validation scheme. Users have the possibility to choose which model they want to compare by specifying it on the methodlist argument. The caretEnsemble package is used in the modeling process to ensure that all models follow the same resampling procedures. ROC is used to select the optimal model for each tree-based method using the largest value. Finally, a summary report is displayed.

Value

This function returns three lists:

DataPartition The partitioned datasets: training (cv_train) and testing (cv_test).

ModelObject An object with results from selected models

SummaryReport A data.frame with the summary of model parameters. The summary report is shown automatically in the output.

Examples


cp025q01.wgt <- cp025q01.wgt[,-14]
colnames(cp025q01.wgt)[14] <- "perf"

ensemblist <- TreeModelsAllSteps(data = cp025q01.wgt,
checkprogress = TRUE)

ensemblist <- TreeModelsAllSteps(data = cp025q01.wgt,
methodlist = c("dt", "gbm"), checkprogress = TRUE)

ensemblist <- TreeModelsAllSteps(data = cp025q01.wgt,
methodlist = c("rf"),
rf.gridsearch = data.frame(mtry = 2, splitrule = "gini", min.node.size = 1),
checkprogress = TRUE)


LOGANTree documentation built on June 23, 2022, 1:06 a.m.