karma.ensemble: Ensemble learning for time-series: Train an ensemble of weak...

Description Usage Arguments Value See Also Examples

Description

Ensemble learning for time-series: Train an ensemble of weak learners (trained models) that produce a strong learner via the aggregation of multiple out-of-sample forecasts.

Usage

1
2
3
4
5
6
7
karma.ensemble(y, nsamples = 10, family = "sarima", method = "greedy",
  optimiser = "semi-stochastic", fixed = F, box_test = F, autolog = F,
  autodiffs = 1, autolags = F, r2_criterion = T, test_type = "auto",
  test_pct = "auto", metric = "MAPE", cv = "out", ac_criterion = F,
  mutations = F, xreg = NULL, N = 100, max_ar = 20, max_ma = 20,
  max_conv = 2, max_rep = 1, max_iter = 200, max_sdiff = T,
  std_smoothing = 1, ci_smoothing = F, plot = T, stdout = T)

Arguments

y

A univariate time-series vector; type <numeric> or <ts>.

nsamples

Maximum number of models in the ensemble (unique models will be usually less); type <numeric> integer.

family

Family of ARMA models to choose from; "boxjenkins": non-seasonal ARIMA models with or without fixed terms; "sarima": standard seasonal and non-seasonal ARIMA models.

method

Box-jenkins model selection algorithm (applicable only when model="boxjenkins"); "greedy": a fully automated karma.boxjenkins in-sample search (default options make it similar to forward selection); "karma": A custom stochastic local search algorithm.

optimiser

Option on the "neighbourhood function" of the optimisation algorithm (applicable only when model="boxjenkins"); "semi-stochastic": Once a neighbourhood region (of either AR and MA terms) has been selected randomly, the candidate solutions are chosen deterministically; "stochastic": Once a neighbourhood region (of either AR and MA terms) has been selected randomly, the candidate neighbour solutions are chosen stochastically.

fixed

Fixed term flag. Indicate whether the fixed term option in Arima() needs to be switched on during model selection (applicable only when model="boxjenkins"); T, F; type <logical>.

box_test

T/F flag. Indicates whether or not a Box-Pierce test for autocorrelation should be performed at every algorithm iteration (applicable only when model="boxjenkins").

autolog

Logarithmic search flag. Indicates whether log-transformations on the input series will be part of the search (applicable only when model="boxjenkins").

autodiffs

Differencing search flag. Indicates whether differencing on the input series will be part of the search (applicable only when model="boxjenkins").

autolags

Flag T/F indicating whether or not to set lags automatically as a function of the length of the series (applicable only when model="boxjenkins").

r2_criterion

Flag T/F incidating whether or not to use adjusted R-square as an ADF model selection criterion. When FALSE, the simplest possible stationarity transformation will be preferred (applicable only when model="boxjenkins").

test_type

Train-test split type, i.e. percentage or fixed window; "auto": will try to read from karma.fit object or generate; "percentage": test_pct = 12 will be read as the 12 percent of the length of the series; "window": test_pct = 12 will be read as the 12 last time points (e.g. months) of the series; "auto" if input series is a ts() object, test_type is set to "window" and test_pct is set to twice the frequency of the series - if test_pct is given a negative factor, then test_pct (window size) will be set to the frequency of the series times the absolute value of that negative number.

test_pct

Percentage of train-test split in cross-validation (e.g. 70-30), positive integer for "window" or "percentage" test_type; "auto" to read from karma.fit object or generate; negative integer value to set window size to a multiple of the series' frequency.

metric

Choose a model validation metric that will be used as the main optimisation criterion during model selection.

cv

Choose cross-validation dataset to be used during model selection; "out": Performance of out-of-sample forecast (classic train/test split) will be used for model validation; "in": Performance of in-sample forecast (classic parametric regression type of validation) will be used for model validation.

ac_criterion

Aucocorrelation / Partial autocorrelation test flag on/off (applicable only when model="boxjenkins"); An optional optimisation constraint which applies portmanteau test on every candidate solution and rejects solutions that do not improve AC/PAC.

mutations

Optional neighbourhood operator (applicable only when model="boxjenkins"); Mutations flag T, F: whether or not to apply random "mutations" (term borrowed from evolutionary algorithms) on a candidate solution when the optimiser is about to converge (a way to escape local optima - works somewhat like an inverse simulated annealing).

xreg

Optional vector or matrix of exogenous regressors; see documentation for Arima(), package 'forecast'.

N

Maximum lag at which to calculate autocorrelation and partial autocorrelatin functions (applicable only when model="boxjenkins"); see documentation for acf(), pacf().

max_ar

Maximum AR term (value of p).

max_ma

Maximum MA term (value of q).

max_conv

For karma.boxjenkins() (applicable only when model="boxjenkins"): Maximum number of iterations without improvement before the algorithm converges forcefully (stuck to a local optimum).

max_rep

For karma-search: Maximum number of iterations without improvement before the algorithm converges naturally (reached a global or local optimum).

max_iter

For karma.boxjenkins() (applicable only when model="boxjenkins"): Maximum number of iterations without improvement before the algorithm converges naturally (reached a global or local optimum).

std_smoothing

Option to filter out predicted values if they exceed a user-defined number of standard deviations away from the mean of the ensemble at any time period. If std_smoothing is equal to or less than 0, no std filter is applied. If std_smoothing is set to a positive integer, then the filter threshold will be that input integer times the standard deviation of the ensemble at that period (added to and subtracted from the mean predicted value of the ensemble at that period).

ci_smoothing

Addition (stricter) ensemble filter which leaves out all predicted values that exceed the aggregated confidence interval of the predicted value of the ensemble at that time period; type <boolean> T/F.

plot

Option to depict plots during local search; if TRUE (default), AC and PAC plots are active. <logical>

stdout

Option to output optimisation diagnostics during local search; <logical>

Value

Object of class "karma.ensemble".

See Also

tseries, forecast

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Create ensemble of 10 Box-Jenkins models:
kensemble <- karma.ensemble(JohnsonJohnson, nsamples = 10, fixed = T)
# Apply cross-validation and calculate MAPE of the aggregated prediction on the out-of-sample data:
karma.cv(kensemble)
# Forecast and plot 12 periods into the future:
karma.forecast(kensemble, h = 12)
# All in one line:
karma.forecast( karma.ensemble(JohnsonJohnson) )

# Create ensemble of 10 SARIMA models:
karma.forecast( karma.ensemble(JohnsonJohnson, family = "sarima" ) )

snarf-snarf/karma documentation built on May 24, 2019, 7:19 a.m.