variable_selection | R Documentation |
Pick best-performing variables for a given set of hyperparameters. All parameters before 'n_folds' identical to a base LSTM model.
variable_selection(
data,
target_variable,
n_timesteps,
fill_na_func = "mean",
fill_ragged_edges_func = "mean",
n_models = 1,
train_episodes = 200,
batch_size = 30,
decay = 0.98,
n_hidden = 20,
n_layers = 2,
dropout = 0,
criterion = "''",
optimizer = "''",
optimizer_parameters = list(lr = 0.01),
n_folds = 1,
init_test_size = 0.2,
pub_lags = c(),
lags = c(),
performance_metric = "RMSE",
alpha = 0,
initial_ordering = "feature_contribution",
quiet = FALSE
)
n_folds |
how many folds for rolling fold validation to do |
init_test_size |
ϵ [0,1]. What proportion of the data to use for testing at the first fold |
pub_lags |
list of periods back each input variable is set to missing. I.e. publication lag of the variable. Leave empty to pick variables only on complete information, no synthetic vintages. |
lags |
simulated periods back to test when selecting variables. E.g. -2 = simulating data as it would have been 2 months before target period, 1 = 1 month after, etc. So [-2, 0, 2] will account for those vintages in model selection. Leave empty to pick variables only on complete information, no synthetic vintages. |
performance_metric |
performance metric to use for variable selection. Pass "RMSE" for root mean square error, "MAE" for mean absolute error, or "AICc" for correctd Akaike Information Criterion. Alternatively can pass a function that takes arguments of a pandas Series of predictions and actuals and returns a scalar. E.g. custom_function(preds, actuals). |
alpha |
ϵ [0,1]. 0 implies no penalization for additional regressors, 1 implies most severe penalty for additional regressors. Not used for "AICc" performance metric. |
initial_ordering |
ϵ ["feature_contribution", "univariate"]. How to get initial order of features to check additively. "feature_contribution" uses the feature contribution of one model, "univariate" calculates univariate models of all features and orders by performance metric. Feature contribution is about twice as fast. |
quiet |
whether or not to print progress |
A list
containing the following elements:
col_names |
list of best-performing column names |
performance |
performance metric of these variables (i.e. best performing) |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.