Analysis: Perform cross-validation for a given model and given dataset

Description Usage Arguments Details Value Examples

View source: R/PerformanceSearch.R

Description

This function performs cross validation for a given model and a given dataset and returns the AUC (or chosen metric - not supported yet) and ROC curve for the best model found on a test set.

Usage

1
2
3
4
5
6
7
Analysis(
  model,
  data,
  outcome = "CVD_status",
  kfolds = 5,
  train.proportion = 0.8
)

Arguments

model

Choice of model to be trained, current supported options are: "xgboost", "svm" and "glm"

data

Data to be used for model training. Must be passed whole (not training/testing) as the splits happens internally

outcome

Response variable of choice. Must be one of the columns in data.

kfolds

Number of folds for k-fold cross validation. (default = 5)

train.proportion

Proportion of data to be kept for training (default = 0.8)

Details

A seed is set internally for reproducibility across runs. The data is split into training/testing set and then the training set is split into folds. The same folds are created at each run ensuring model performance is comparable across models and not affected by random chance.

Value

A plot of the ROC curve from test data and the corresponding AUC value

Examples

1
2
3
4
data(iris)
iris = iris %>% dplyr::filter(Species!="virginica") %>%
  dplyr::mutate(Species = as.numeric(Species)-1)
Analysis("glm",iris, outcome = "Species")

lc5415/HDATDS documentation built on April 27, 2020, 6:04 a.m.