#' Perform batch random search using a paramset.
#' @description If your parameter search is likely to take a long time, this function
#' allows you to do it in batches, saving the result of the search results to disk
#' after each batch. This incurs a penalty in running time, because the assessment splits
#' are recomputed (or `baked` in `recipes`). terminology at the beginning of each batch.
#' The smaller the batch size, the bigger the penalty.
#' @param resamples A data.frame with columns `splits` and `id`, created using the `rsample` package.
#' @param recipe The recipe to use. See package `recipes`.
#' @param param_set Param set created by calling ParamHelpers::makeParamset.
#' @param n Number of parameter combinations to generate.
#' @param scoring_func Your custom train/predict/score function.
#' Must take as parameters:
#' \itemize{
#' \item a training dataframe
#' \item the name of the target variable in the training dataframe
#' \item a list of parameters (these are the hyperparameters we are tuning)
#' \item an evaluation dataframe
#' \item dots. These are additional non-tunable parameters that could be passed to the function.
#' }
#' @param ... Optional params passed to train_predict_func.
#' @param batch_size Size of the batches.
#' @param out_folder Where to save the intermediate batch results. Folder will be created if not found.
#' @param file_prefix Used to name the results files.
#' @param overwrite Overwrite existing results files or create new ones.
#' @details `scoring_func` can return a single score as a numeric vector,
#' or multiple scores in a data.frame.
#' The output folder will be scanned for files corresponding to pattern <file_prefix>_n.RDS.
#' If overwrite is false, the outputs of the current run will be witten to files starting at n + 1.
#' Otherwise it starts at 1 (i.e. <file_prefix_1.RDS).
#' Option verbose will print the batch number at the beginning of each batch.
#' @param verbosity Integer: level of verbosity, or TRUE/FALSE for max/min verbosity.
#' @return A tidy data.frame, the aggregate result. This is the same as without the batches.
#' @export
batch_random_search <-
function(resamples,
recipe,
param_set,
n,
scoring_func,
...,
batch_size,
out_folder = '.',
file_prefix = 'batch_',
overwrite = FALSE,
verbosity = TRUE){
param_grid_df <- generateRandomDesign(n, param_set, trafo = TRUE)
batch_grid_search(
resamples = resamples,
rec = recipe,
param_grid = param_grid_df,
scoring_func = scoring_func,
...,
batch_size = batch_size,
out_folder = out_folder,
file_prefix = file_prefix,
overwrite = overwrite,
verbosity = verbosity
)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.