surrogate_search: Use a surrogate model to find potentially good parameter...

Description Usage Arguments Details Value Note


The surrogate model used is a random forest regressor (see 'ranger' package). We generate n_candidates random parameter combinations, and ask the surrogate model to rank them according to their predicted performance. We then take the top_n combinations and pass them through to the actual underlying model.


surrogate_search(resamples, recipe, param_set, n, scoring_func, ..., input,
  surrogate_target, n_candidates = 1000, top_n = 10, verbosity = TRUE)



A data.frame with columns 'splits' and 'id', created using the 'rsample' package.


The recipe to use. See package 'recipes'.


Param set created by calling ParamHelpers::makeParamset.


Number of runs of the surrogate model. Can be a vector for iterative surrogate search.


Your custom train/predict/score function. Must take as parameters:

  • a training dataframe

  • the name of the target variable in the training dataframe

  • a list of parameters (these are the hyperparameters we are tuning)

  • an evaluation dataframe

  • dots. These are additional non-tunable parameters that could be passed to the function.


Optional params passed to train_predict_func.


Input to the surrogate model. This should be the output of a previous parameter search.


How many candidate parameter combinations should be evaluated in each surrogate model run. If n_candidates == 0, fall back to regular random search. If 'n' is a vector, 'n_candidates' must be a vector of length 1 or the same length as 'n'.


Out of the n_candidates, we will keep the top n (as predicted by the surrogate) to test with the actual underlying model. If 'n' is a vector, 'top_n' must be a vector of length 1 or the same length as 'n'.


Integer: level of verbosity, or TRUE/FALSE (TRUE is maximum verbosity, FALSE is not verbose).


'scoring_func' can return a single score as a numeric vector, or multiple scores in a data.frame.


A tidy data.frame, with one column per parameter, columns to identify the paramset and the fold, a column giving the row indices of the evaluation dataset, and columns for the performance scores (these are taken from the scoring function if it returned a data.frame, otherwise it will just be a _score_ column).


The results of the surrogate search (i.e. the performance of parameters selected by the surrogate model) are appended to the input after each run, so that the surrogate model can also learn from it's own suggestions.

artichaud1/cook documentation built on May 21, 2019, 9:23 a.m.