variable_selection: Variable selection

variable_selectionR Documentation

Variable selection

Description

Pick best-performing variables for a given set of hyperparameters. All parameters before 'n_folds' identical to a base LSTM model.

Usage

variable_selection(
  data,
  target_variable,
  n_timesteps,
  fill_na_func = "mean",
  fill_ragged_edges_func = "mean",
  n_models = 1,
  train_episodes = 200,
  batch_size = 30,
  decay = 0.98,
  n_hidden = 20,
  n_layers = 2,
  dropout = 0,
  criterion = "''",
  optimizer = "''",
  optimizer_parameters = list(lr = 0.01),
  n_folds = 1,
  init_test_size = 0.2,
  pub_lags = c(),
  lags = c(),
  performance_metric = "RMSE",
  alpha = 0,
  initial_ordering = "feature_contribution",
  quiet = FALSE
)

Arguments

n_folds

how many folds for rolling fold validation to do

init_test_size

ϵ [0,1]. What proportion of the data to use for testing at the first fold

pub_lags

list of periods back each input variable is set to missing. I.e. publication lag of the variable. Leave empty to pick variables only on complete information, no synthetic vintages.

lags

simulated periods back to test when selecting variables. E.g. -2 = simulating data as it would have been 2 months before target period, 1 = 1 month after, etc. So [-2, 0, 2] will account for those vintages in model selection. Leave empty to pick variables only on complete information, no synthetic vintages.

performance_metric

performance metric to use for variable selection. Pass "RMSE" for root mean square error, "MAE" for mean absolute error, or "AICc" for correctd Akaike Information Criterion. Alternatively can pass a function that takes arguments of a pandas Series of predictions and actuals and returns a scalar. E.g. custom_function(preds, actuals).

alpha

ϵ [0,1]. 0 implies no penalization for additional regressors, 1 implies most severe penalty for additional regressors. Not used for "AICc" performance metric.

initial_ordering

ϵ ["feature_contribution", "univariate"]. How to get initial order of features to check additively. "feature_contribution" uses the feature contribution of one model, "univariate" calculates univariate models of all features and orders by performance metric. Feature contribution is about twice as fast.

quiet

whether or not to print progress

Value

A list containing the following elements:

col_names

list of best-performing column names

performance

performance metric of these variables (i.e. best performing)


dhopp1/nowcastLSTM documentation built on May 7, 2024, 9:30 a.m.