select_model | R Documentation |
Pick best-performing hyperparameters and variables for a given dataset. Given all permutations of hyperparameters (k), and p variables in the data, this function will run k * p * 2 models. This can take a very long time. To cut down on this time, run it with a highly reduced hyperparameter grid, i.e., a very small k, then record the selected variables, then run the 'hyperparameter_tuning' function with these selected varaibles with a much more detailed grid. All parameters up to 'optimizer_parameters' exactly the same as for any LSTM() model, provide a list with the values to check.
select_model(
data,
target_variable,
n_models = 1,
n_timesteps_grid = c(6, 12),
fill_na_func_grid = c("mean"),
fill_ragged_edges_func_grid = c("mean"),
train_episodes_grid = c(50, 100, 200),
batch_size_grid = c(30, 100, 200),
decay_grid = c(0.98),
n_hidden_grid = c(10, 20, 40),
n_layers_grid = c(1, 2, 4),
dropout_grid = c(0),
criterion_grid = c("''"),
optimizer_grid = c("''"),
optimizer_parameters_grid = c(list(lr = 0.01)),
n_folds = 1,
init_test_size = 0.2,
pub_lags = c(),
lags = c(),
performance_metric = "RMSE",
alpha = 0,
initial_ordering = "feature_contribution",
quiet = FALSE
)
n_folds |
how many folds for rolling fold validation to do |
init_test_size |
ϵ [0,1]. What proportion of the data to use for testing at the first fold |
pub_lags |
list of periods back each input variable is set to missing. I.e. publication lag of the variable. Leave empty to pick variables only on complete information, no synthetic vintages. |
lags |
simulated periods back to test when selecting variables. E.g. -2 = simulating data as it would have been 2 months before target period, 1 = 1 month after, etc. So [-2, 0, 2] will account for those vintages in model selection. Leave empty to pick variables only on complete information, no synthetic vintages. |
performance_metric |
performance metric to use for variable selection. Pass "RMSE" for root mean square error, "MAE" for mean absolute error, or "AICc" for correctd Akaike Information Criterion. Alternatively can pass a function that takes arguments of a pandas Series of predictions and actuals and returns a scalar. E.g. custom_function(preds, actuals). |
alpha |
ϵ [0,1]. 0 implies no penalization for additional regressors, 1 implies most severe penalty for additional regressors. Not used for "AICc" performance metric. |
initial_ordering |
ϵ ["feature_contribution", "univariate"]. How to get initial order of features to check additively. "feature_contribution" uses the feature contribution of one model, "univariate" calculates univariate models of all features and orders by performance metric. Feature contribution is about twice as fast. |
A dataframe
containing the following elements:
variables |
list of variables |
hyper_params |
list of hyperparameters, access via df$hyper_params[[1]], etc. |
performance |
performance metric of these hyperparameteres |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.