Analysis: Perform cross-validation for a given model and given dataset
In lc5415/HDATDS: Useful function for TDS project

Description Usage Arguments Details Value Examples

View source: R/PerformanceSearch.R

This function performs cross validation for a given model and a given dataset and returns the AUC (or chosen metric - not supported yet) and ROC curve for the best model found on a test set.

Analysis(
  model,
  data,
  outcome = "CVD_status",
  kfolds = 5,
  train.proportion = 0.8
)

`model`	Choice of model to be trained, current supported options are: "xgboost", "svm" and "glm"
`data`	Data to be used for model training. Must be passed whole (not training/testing) as the splits happens internally
`outcome`	Response variable of choice. Must be one of the columns in data.
`kfolds`	Number of folds for k-fold cross validation. (default = 5)
`train.proportion`	Proportion of data to be kept for training (default = 0.8)

A seed is set internally for reproducibility across runs. The data is split into training/testing set and then the training set is split into folds. The same folds are created at each run ensuring model performance is comparable across models and not affected by random chance.

A plot of the ROC curve from test data and the corresponding AUC value

data(iris)
iris = iris %>% dplyr::filter(Species!="virginica") %>%
  dplyr::mutate(Species = as.numeric(Species)-1)
Analysis("glm",iris, outcome = "Species")