show_best: Investigate best tuning parameters

View source: R/select_best.R

show_bestR Documentation

Investigate best tuning parameters

Description

show_best() displays the top sub-models and their performance estimates.

select_best() finds the tuning parameter combination with the best performance values.

select_by_one_std_err() uses the "one-standard error rule" (Breiman _el at, 1984) that selects the most simple model that is within one standard error of the numerically optimal results.

select_by_pct_loss() selects the most simple model whose loss of performance is within some acceptable limit.

Usage

show_best(x, ...)

## Default S3 method:
show_best(x, ...)

## S3 method for class 'tune_results'
show_best(
  x,
  ...,
  metric = NULL,
  eval_time = NULL,
  n = 5,
  call = rlang::current_env()
)

select_best(x, ...)

## Default S3 method:
select_best(x, ...)

## S3 method for class 'tune_results'
select_best(x, ..., metric = NULL, eval_time = NULL)

select_by_pct_loss(x, ...)

## Default S3 method:
select_by_pct_loss(x, ...)

## S3 method for class 'tune_results'
select_by_pct_loss(x, ..., metric = NULL, eval_time = NULL, limit = 2)

select_by_one_std_err(x, ...)

## Default S3 method:
select_by_one_std_err(x, ...)

## S3 method for class 'tune_results'
select_by_one_std_err(x, ..., metric = NULL, eval_time = NULL)

Arguments

x

The results of tune_grid() or tune_bayes().

...

For select_by_one_std_err() and select_by_pct_loss(), this argument is passed directly to dplyr::arrange() so that the user can sort the models from most simple to most complex. That is, for a parameter p, pass the unquoted expression p if smaller values of p indicate a simpler model, or desc(p) if larger values indicate a simpler model. At least one term is required for these two functions. See the examples below.

metric

A character value for the metric that will be used to sort the models. (See https://yardstick.tidymodels.org/articles/metric-types.html for more details). Not required if a single metric exists in x. If there are multiple metric and none are given, the first in the metric set is used (and a warning is issued).

eval_time

A single numeric time point where dynamic event time metrics should be chosen (e.g., the time-dependent ROC curve, etc). The values should be consistent with the values used to create x. The NULL default will automatically use the first evaluation time used by x.

n

An integer for the number of top results/rows to return.

call

The call to be shown in errors and warnings.

limit

The limit of loss of performance that is acceptable (in percent units). See details below.

Details

For percent loss, suppose the best model has an RMSE of 0.75 and a simpler model has an RMSE of 1. The percent loss would be (1.00 - 0.75)/1.00 * 100, or 25 percent. Note that loss will always be non-negative.

Value

A tibble with columns for the parameters. show_best() also includes columns for performance metrics.

References

Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and Regression Trees. Monterey, CA: Wadsworth.

Examples


data("example_ames_knn")

show_best(ames_iter_search, metric = "rmse")

select_best(ames_iter_search, metric = "rsq")

# To find the least complex model within one std error of the numerically
# optimal model, the number of nearest neighbors are sorted from the largest
# number of neighbors (the least complex class boundary) to the smallest
# (corresponding to the most complex model).

select_by_one_std_err(ames_grid_search, metric = "rmse", desc(K))

# Now find the least complex model that has no more than a 5% loss of RMSE:
select_by_pct_loss(
  ames_grid_search,
  metric = "rmse",
  limit = 5, desc(K)
)


tune documentation built on May 29, 2024, 7:32 a.m.