ml_cv_filter: A function to filter models based on performance metrics from...

Description Usage Arguments Details Value Examples

Description

Given a list of caret models, this function will select models based on the metrics from the cross-validation.

Usage

1
2
ml_cv_filter(models, metric = "ROC", mini = NULL, max = NULL,
  FUN = median)

Arguments

models

A list of caret models generated from ml_list function or by combining caret models into a list manually.

metric

A character, the metric used to filter models. The default is "ROC".

mini

A numeric value, the minimum value of the metric summarization.

max

A numeric value, the maximum value of the metric summarization.

FUN

A function that summarise the performance metric from cross-validation, like median, sd or mean. The default is median.

Details

What the function does is extract the cross-validation results like this models Then filter the models based on the summary function defined and the metric to use. This function could handle metrics "ROC", "Accuracy", "Kappa","Spec","Sens" as supported by caret package natively for now. Expect to add more metrics in the future.

Value

A list of caret models that its performance metrics from cross-validation satisfy the conditions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Not run: 
 # This select models that has a cv minimum of 0.75 median accuracy
 testmodels_metric_filtered=testmodels_churn%>%ml_cv_filter(metric="Accuracy",mini=0.75,FUN=median)

 # this select models that has a cv minimum standard deviation of 0.01
 testmodels_metric_filtered=testmodels_churn%>%ml_cv_filter(metric="Accuracy",max=0.01,FUN=sd)
 # select models that has a cv minimum ROC median of 0.84 and a maximum ROC standard deviation of 0.01
 testmodels_metric_filtered=testmodels_churn%>%ml_cv_filter(metric="ROC",mini=0.84,FUN=median)%>%ml_cv_filter(metric="ROC",max=0.01,FUN=sd)
 # select models that has a cv median ROC value between 0.84 and 0.84275.
 testmodels_metric_filtered=testmodels_churn%>%filter_model(metric="ROC",mini=0.84,max=0.84275,FUN=median)


 # you could use custom functions to calculate a statistic for a k-fold performance metric.
 # Just define it in the global environment and use the function name without quotes in the FUN argument.
 # This function used the performance metrics after feed the model into resamples function in caret package.
 # You could get the same dataframe with model_list%>%resamples%>%.$values.


## End(Not run)

edwardcooper/automl documentation built on June 3, 2019, 1:05 a.m.