variable_selection: Variable selection
In dhopp1/nowcastLSTM: LSTMs for Nowcasting

variable_selection

R Documentation

Variable selection

Description

Pick best-performing variables for a given set of hyperparameters. All parameters before 'n_folds' identical to a base LSTM model.

Usage

variable_selection(
  data,
  target_variable,
  n_timesteps,
  fill_na_func = "mean",
  fill_ragged_edges_func = "mean",
  n_models = 1,
  train_episodes = 200,
  batch_size = 30,
  decay = 0.98,
  n_hidden = 20,
  n_layers = 2,
  dropout = 0,
  criterion = "''",
  optimizer = "''",
  optimizer_parameters = list(lr = 0.01),
  n_folds = 1,
  init_test_size = 0.2,
  pub_lags = c(),
  lags = c(),
  performance_metric = "RMSE",
  alpha = 0,
  initial_ordering = "feature_contribution",
  quiet = FALSE
)

Arguments

`n_folds`	how many folds for rolling fold validation to do
`init_test_size`	ϵ [0,1]. What proportion of the data to use for testing at the first fold
`pub_lags`	list of periods back each input variable is set to missing. I.e. publication lag of the variable. Leave empty to pick variables only on complete information, no synthetic vintages.
`lags`	simulated periods back to test when selecting variables. E.g. -2 = simulating data as it would have been 2 months before target period, 1 = 1 month after, etc. So [-2, 0, 2] will account for those vintages in model selection. Leave empty to pick variables only on complete information, no synthetic vintages.
`performance_metric`	performance metric to use for variable selection. Pass "RMSE" for root mean square error, "MAE" for mean absolute error, or "AICc" for correctd Akaike Information Criterion. Alternatively can pass a function that takes arguments of a pandas Series of predictions and actuals and returns a scalar. E.g. custom_function(preds, actuals).
`alpha`	ϵ [0,1]. 0 implies no penalization for additional regressors, 1 implies most severe penalty for additional regressors. Not used for "AICc" performance metric.
`initial_ordering`	ϵ ["feature_contribution", "univariate"]. How to get initial order of features to check additively. "feature_contribution" uses the feature contribution of one model, "univariate" calculates univariate models of all features and orders by performance metric. Feature contribution is about twice as fast.
`quiet`	whether or not to print progress