R/batch_random_search.R

#' Perform batch random search using a paramset.
#' @description If your parameter search is likely to take a long time, this function
#' allows you to do it in batches, saving the result of the search results to disk
#' after each batch. This incurs a penalty in running time, because the assessment splits
#' are recomputed (or `baked` in `recipes`). terminology at the beginning of each batch.
#' The smaller the batch size, the bigger the penalty.
#' @param resamples A data.frame with columns `splits` and `id`, created using the `rsample` package.
#' @param recipe The recipe to use. See package `recipes`.
#' @param param_set Param set created by calling ParamHelpers::makeParamset.
#' @param n Number of parameter combinations to generate.
#' @param scoring_func Your custom train/predict/score function. 
#' Must take as parameters: 
#' \itemize{
#'     \item a training dataframe
#'     \item the name of the target variable in the training dataframe
#'     \item a list of parameters (these are the hyperparameters we are tuning)
#'     \item an evaluation dataframe
#'     \item dots. These are additional non-tunable parameters that could be passed to the function.
#' }
#' @param ... Optional params passed to train_predict_func.
#' @param batch_size Size of the batches.
#' @param out_folder Where to save the intermediate batch results. Folder will be created if not found.
#' @param file_prefix Used to name the results files.
#' @param overwrite Overwrite existing results files or create new ones.
#' @details `scoring_func` can return a single score as a numeric vector, 
#' or multiple scores in a data.frame. 
#' The output folder will be scanned for files corresponding to pattern <file_prefix>_n.RDS.
#' If overwrite is false, the outputs of the current run will be witten to files starting at n + 1.
#' Otherwise it starts at 1 (i.e. <file_prefix_1.RDS).
#'     Option verbose will print the batch number at the beginning of each batch.
#' @param verbosity Integer: level of verbosity, or TRUE/FALSE for max/min verbosity.
#' @return A tidy data.frame, the aggregate result. This is the same as without the batches.
#' @export
batch_random_search <- 
  function(resamples, 
           recipe, 
           param_set, 
           n,
           scoring_func, 
           ...,
           batch_size,
           out_folder = '.',
           file_prefix = 'batch_',
           overwrite = FALSE,
           verbosity = TRUE){
    
    param_grid_df <- generateRandomDesign(n, param_set, trafo = TRUE)
    
    batch_grid_search(
      resamples = resamples, 
      rec = recipe, 
      param_grid = param_grid_df, 
      scoring_func = scoring_func, 
      ...,
      batch_size = batch_size,
      out_folder = out_folder,
      file_prefix = file_prefix,
      overwrite = overwrite,
      verbosity = verbosity
    )
  }
artichaud1/tidygrid documentation built on May 10, 2019, 9:28 a.m.