find_model_through_bayes: Hyper parameter search using bayesian optimisation

Description Usage Arguments Details Value

View source: R/bayesian_optimisation.R

Description

Hyper parameter search using bayesian optimisation

Usage

1
2
3
4
5
6
7
8
9
find_model_through_bayes(train, test, response,
  preprocess_pipes = list(function(train, test) return(list(train =
  train, test = train, .predict = function(data) return(data)))), models,
  metrics, target_metric, higher_is_better, N_init = 20,
  N_experiment = 40, sigma_noise = 1e-08, prepend_data_checker = T,
  on_missing_column = c("error", "add")[1],
  on_extra_column = c("remove", "error")[1],
  on_type_error = c("ignore", "error")[1], seed = 1, verbose = T,
  save_model = F)

Arguments

train

The training dataset

test

The testing dataset

response

The response column as a string

preprocess_pipes

List of preprocessing pipelines generated using pipeline.

models

A list of models. Each model should be a list, containing at least a training function .train and a .predict function, plus named vectors of parameters to explore.

The .train function has to take a data argument that stores the training data and a ... argument for the parameters. The .predict function needs to take two arguments, where the first is the model and the second the new dataset.

If a parameter only takes a single value, you can use a vector to store options. Otherwise use a list.

You can use model_trainer as a wrapper for this list. It will also test your inputs.

metrics

A list of metrics (functions) that need to be calculated on the train and test response and predictions. Must be named.

target_metric

The name of the metric to optimise. Optimisation will be done on the testset performance of this metric.

higher_is_better

A flag indicating if a high value of target_metric indicates a good result.

N_init

Number of iterations to initialise the bayesian optimisation with.

N_experiment

Number of experimentations done with the bayesian optimisation.

  • A numeric vector with as many entries as x.

  • A numeric matrix with as many columns as entries in x.

sigma_noise

An estimate of the inherent noise in sampling from. If this is set below 1e-8, we will not reconsider previously tried configurations.

prepend_data_checker

Flag indicating if pipe_check should be prepended before all pipelines.

on_missing_column

See pipe_check for details.

on_extra_column

See pipe_check for details.

on_type_error

See pipe_check for details.

seed

Random seed to set each time before a model is trained. Set this to 0 to ignore setting seeds.

verbose

Should intermediate updates be printed.

save_model

Flag indicating if the generated models should be saved. Defaults to False.

Details

This implementation is still in an early phase. Bugs may exist, but results so far have been promising (Dec 2018).

Value

A dataframe containing the training function, a list of parameters used to train the function, and one column for each metric / dataset combination.


jeroenvdhoven/datapiper documentation built on July 14, 2019, 9:34 p.m.