select_model: Variable selection and hyperparameter tuning combined.

select_modelR Documentation

Variable selection and hyperparameter tuning combined.

Description

Pick best-performing hyperparameters and variables for a given dataset. Given all permutations of hyperparameters (k), and p variables in the data, this function will run k * p * 2 models. This can take a very long time. To cut down on this time, run it with a highly reduced hyperparameter grid, i.e., a very small k, then record the selected variables, then run the 'hyperparameter_tuning' function with these selected varaibles with a much more detailed grid. All parameters up to 'optimizer_parameters' exactly the same as for any LSTM() model, provide a list with the values to check.

Usage

select_model(
  data,
  target_variable,
  n_models = 1,
  n_timesteps_grid = c(6, 12),
  fill_na_func_grid = c("mean"),
  fill_ragged_edges_func_grid = c("mean"),
  train_episodes_grid = c(50, 100, 200),
  batch_size_grid = c(30, 100, 200),
  decay_grid = c(0.98),
  n_hidden_grid = c(10, 20, 40),
  n_layers_grid = c(1, 2, 4),
  dropout_grid = c(0),
  criterion_grid = c("''"),
  optimizer_grid = c("''"),
  optimizer_parameters_grid = c(list(lr = 0.01)),
  n_folds = 1,
  init_test_size = 0.2,
  pub_lags = c(),
  lags = c(),
  performance_metric = "RMSE",
  alpha = 0,
  initial_ordering = "feature_contribution",
  quiet = FALSE
)

Arguments

n_folds

how many folds for rolling fold validation to do

init_test_size

ϵ [0,1]. What proportion of the data to use for testing at the first fold

pub_lags

list of periods back each input variable is set to missing. I.e. publication lag of the variable. Leave empty to pick variables only on complete information, no synthetic vintages.

lags

simulated periods back to test when selecting variables. E.g. -2 = simulating data as it would have been 2 months before target period, 1 = 1 month after, etc. So [-2, 0, 2] will account for those vintages in model selection. Leave empty to pick variables only on complete information, no synthetic vintages.

performance_metric

performance metric to use for variable selection. Pass "RMSE" for root mean square error, "MAE" for mean absolute error, or "AICc" for correctd Akaike Information Criterion. Alternatively can pass a function that takes arguments of a pandas Series of predictions and actuals and returns a scalar. E.g. custom_function(preds, actuals).

alpha

ϵ [0,1]. 0 implies no penalization for additional regressors, 1 implies most severe penalty for additional regressors. Not used for "AICc" performance metric.

initial_ordering

ϵ ["feature_contribution", "univariate"]. How to get initial order of features to check additively. "feature_contribution" uses the feature contribution of one model, "univariate" calculates univariate models of all features and orders by performance metric. Feature contribution is about twice as fast.

Value

A dataframe containing the following elements:

variables

list of variables

hyper_params

list of hyperparameters, access via df$hyper_params[[1]], etc.

performance

performance metric of these hyperparameteres


dhopp1/nowcastLSTM documentation built on May 7, 2024, 9:30 a.m.