initialize_params_random: Initialize paramaters to optimize over randomly
In timradtke/heuristika: Robust Probabilistic Forecasts to Tinker With

initialize_params_random

R Documentation

Initialize paramaters to optimize over randomly

Description

Use this function to generate a randomly sampled parameter grid (matrix) that can be provided to the param_grid argument of tulip(). The model estimation then reduces to evaluating all provided combinations and choosing the best one. See Details for more. Note that there is nothing special about the matrix generated by this function—you can define a set of possible parameters in any way that suits you.

Usage

initialize_params_random(
  n_damped = 1000,
  n = n_damped/2,
  n_no_trend = ceiling(n_damped^(2/3)),
  n_no_season = ceiling(n_damped^(2/3)),
  n_no_trend_no_season = ceiling(n_damped^(1/3)),
  alpha_lower = 0,
  alpha_upper = 1,
  beta_lower = 0,
  beta_upper = 1,
  gamma_lower = 0,
  gamma_upper = 1,
  beta_smaller_than_alpha = TRUE,
  gamma_smaller_than_one_minus_alpha = TRUE,
  oversample_lower = 0.05,
  oversample_upper = 0.05,
  seed = NULL
)

Arguments

`n_damped`	Number of parameter combinations to sample that include all three states (level, trend, seasonality) and dampening of the trend. Note: By default, `n`, `n_no_trend`, `n_no_season`, `n_no_trend_no_season` are functions of the value chosen for `n_damped`.
`n`	Number of parameter combinations to sample that include all three states (level, trend, seasonality), but no dampening.
`n_no_trend`	Number of parameter combinations to sample that set the trend parameters to 0, implying models that don't have a trend component; since only two dimensions are sampled (level and seasonality), it usually makes sense to use a smaller `n_no_trend` than `n` (unless you are not interested in models with trend).
`n_no_season`	Number of parameter combinations to sample that set the seasonality parameters to 0, implying models that don't have a seasonality component; since only two dimensions are sampled (level and trend), it usually makes sense to use a smaller `n_no_season` than `n` (unless you are not interested in models with seasonality).
`n_no_trend_no_season`	Number of parameter combinations to sample that set the trend and seasonality parameters to 0, implying models that don't have a seasonality component; since only one dimension is sampled (level), it usually makes sense to use a smaller `n_no_trend_no_season` than `n` (unless you are not interested in models with trend and seasonality).
`alpha_lower`	A scalar value defining the lowest possible value for the `alpha` parameter. Can't be less than 0.
`alpha_upper`	A scalar value defining the largest possible value for the `alpha` parameter. The default is 1, but values larger than 1 are possible. Can't be less than `alpha_lower`. If `alpha_lower` is equal to `alpha_upper`, all samples for `alpha` will be equal to `alpha_lower` and `alpha_upper` exactly.
`beta_lower`	A scalar value defining the lowest possible value for the `beta` parameter. Can't be less than 0.
`beta_upper`	A scalar value defining the largest possible value for the `beta` parameter. The default is 1, but values larger than 1 are possible. Can't be less than `beta_lower`. If `beta_lower` is equal to `beta_upper`, all samples for `beta` will be equal to `beta_lower` and `beta_upper` exactly.
`gamma_lower`	A scalar value defining the lowest possible value for the `gamma` parameter. Can't be less than 0.
`gamma_upper`	A scalar value defining the largest possible value for the `gamma` parameter. The default is 1, but values larger than 1 are possible. Can't be less than `gamma_lower`. If `gamma_lower` is equal to `gamma_upper`, all samples for `gamma` will be equal to `gamma_lower` and `gamma_upper` exactly.
`beta_smaller_than_alpha`	If `TRUE` (default), sampling of `beta` is conditional on the value of the sampled `alpha`, using `pmin(alpha, beta_upper)` as the upper limit for `beta`.
`gamma_smaller_than_one_minus_alpha`	If `TRUE` (default), sampling of `gamma` is conditional on the value of the sampled `alpha`, using `pmin(1 - alpha, gamma_upper)` as the upper limit for `gamma`.
`oversample_lower`	Can be used to increase the chances that the lowest allowed value is sampled for the parameters `alpha`, `beta`, and `gamma`. This can be useful to find parameter combinations in which a component is not smoothed at all and thus some constant average across all combinations. For example, an ETS model with only a `level` component and `alpha = 0` would be equivalent to the mean forecast.
`oversample_upper`	Can be used to increase the chances that the largest allowed value is sampled for the parameters `alpha`, and `gamma`. This can be useful to find parameter combinations in which a component is heavily smoothed (ajdusting to the latest observation). This turns the level component into behavior similar to a random walk forecast (if `alpha_upper = 1`), and the seasonal component into behavior similar to a seasonal random walk forecast (if `gamma_upper = 1`).
`seed`	Since the parameter grid is sampled randomly, you can set a seed (local to the function) for reproducibility.

Details

The optimization procedure in tulip() evaluates each combination of parameters provided via param_grid. While this is computationally costly, it is also computationally stable. By consciously choosing parameters that are trialled, unstable parameter combinations can be avoided. The prior probability for many parameter combinations can be set to zero this way. If the set of parameters can be restricted very far (for example, because one updates from a previous fit or based on a related time series), it also makes the optimization computationally cheap.

In contrast to initialize_params_grid(), this function draws random combinations of alpha, beta, and gamma from an allowed space of values. This can allow for better overall optimization of the model, as the overall space of possible parameters is covered better. See also Bergstra and Bengio (2012) referenced below for a comparison of grid search and random search.

Depending on the set of chosen function arguments, it can be likely that the function generates some duplicate parameter combinations (for example when oversample_upper or oversample_lower are non-zero). These will be dropped before the final matrix is returned. This means, however, that the function does not guarantee to return n + n_damped + n_no_trend + n_no_season + n_no_trend_no_season parameter combinations. It might return less than that.

One can also combine a fixed set of parameters and randomly drawn parameters, for example to always evaluate parameter combinations known to provide good results for other time series, or to also evaluate parameters that were found at a previous training on the same time series, or to include a set of benchmark models via initialize_params_naive(), for example. See also the examples below.

Value

A numeric matrix with six named columns: 'alpha', 'one_minus_alpha', 'beta', 'one_minus_beta', 'gamma', 'one_minus_gamma'. The alpha paramaters belong to the model's level component, the beta parameters to the model's trend component, and the gamma parameters to the model's seasonality component. Each pair usually adds up to 1, however dampening effectively reduces the sum of beta and one_minus_beta to less than 1. As per assertions on tulip()'s param_grid, each row must sum up to a value between 0 and 3, the columns must be named and in order, and each individual value must be between 0 and 1.

References

Rob J. Hyndman, Anne B. Koehler, Ralph D. Snyder, and Simone Grose (2002). A State Space Framework for Automatic Forecasting using Exponential Smoothing Methods.: https://doi.org/10.1016/S0169-2070(01)00110-8
James Bergstra, Yoshua Bengio (2012). Random Search for Hyperparameter Optimization.: https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

Examples

library(ggplot2)

param_grid_small <- initialize_params_random(
  n_damped = 46,
  seed = 388
)

nrow(param_grid_small)

summary(param_grid_small[, "alpha"])
summary(param_grid_small[, "beta"])
summary(param_grid_small[, "one_minus_beta"])
summary(param_grid_small[, "gamma"])

ggplot(as.data.frame(param_grid_small),
       aes(x = alpha, y = gamma,fill = one_minus_beta)) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,1)) +
  geom_abline(intercept = 1, slope = -1, linetype = 3) +
  geom_point(pch = 21, color = "white")

# No one prevents you from combining a set of randomly drawn parameter
# combinations with a fixed set of parameters; for example, you can always
# evaluate parameters that correspond to the Random Walk, Seasonal Random
# Walk, or Mean model:

param_grid_w_naive <- rbind(
  initialize_params_naive(),
  param_grid_small
)

head(param_grid_w_naive)

# note the new dots in the corners at (0, 0) and (0, 1)
ggplot(as.data.frame(param_grid_w_naive),
       aes(x = alpha, y = gamma,fill = one_minus_beta)) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,1)) +
  geom_abline(intercept = 1, slope = -1, linetype = 3) +
  geom_point(pch = 21, color = "white")

# More samples cover the possible parameter space better
param_grid <- initialize_params_random(
  n_damped = 1000,
  seed = 388
)

nrow(param_grid)

summary(param_grid[, "alpha"])
summary(param_grid[, "beta"])
summary(param_grid[, "one_minus_beta"])
summary(param_grid[, "gamma"])

ggplot(as.data.frame(param_grid),
       aes(x = alpha, y = gamma,fill = one_minus_beta)) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,1)) +
  geom_abline(intercept = 1, slope = -1, linetype = 3) +
  geom_point(pch = 21, color = "white")

# by default, we oversample the borders; this can be turned off to not
# sample 0- and 1-valued parameters as often
param_grid_no_border_sampling <- initialize_params_random(
  n_damped = 1000,
  seed = 388,
  oversample_lower = 0,
  oversample_upper = 0
)

summary(param_grid_no_border_sampling[, "alpha"])
summary(param_grid_no_border_sampling[, "beta"])
summary(param_grid_no_border_sampling[, "one_minus_beta"])
summary(param_grid_no_border_sampling[, "gamma"])

ggplot(as.data.frame(param_grid_no_border_sampling),
       aes(x = alpha, y = gamma, fill = one_minus_beta)) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,1)) +
  geom_abline(intercept = 1, slope = -1, linetype = 3) +
  geom_point(pch = 21, color = "white")

# The parameter space can be limited, sampling remains uniform
param_grid_restricted <- initialize_params_random(
  n_damped = 1000,
  seed = 388,
  alpha_upper = 0.5,
  beta_upper = 0.05,
  gamma_upper = 0.5,
  oversample_lower = 0.05,
  oversample_upper = 0
)

nrow(param_grid_restricted)

summary(param_grid_restricted[, "alpha"])
summary(param_grid_restricted[, "beta"])
summary(param_grid_restricted[, "one_minus_beta"])
summary(param_grid_restricted[, "gamma"])

ggplot(as.data.frame(param_grid_restricted),
       aes(x = alpha, y = gamma, fill = one_minus_beta)) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,1)) +
  geom_abline(intercept = 1, slope = -1, linetype = 3) +
  geom_point(pch = 21, color = "white")

timradtke/heuristika documentation built on April 24, 2023, 1:55 a.m.