Description Usage Arguments Details Engine Details Fit Details See Also Examples
View source: R/parsniparima_boost.R
arima_boost()
is a way to generate a specification of a time series model
that uses boosting to improve modeling errors (residuals) on Exogenous Regressors.
It works with both "automated" ARIMA (auto.arima
) and standard ARIMA (arima
).
The main algorithms are:
Auto ARIMA + XGBoost Errors (engine = auto_arima_xgboost
, default)
ARIMA + XGBoost Errors (engine = arima_xgboost
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  arima_boost(
mode = "regression",
seasonal_period = NULL,
non_seasonal_ar = NULL,
non_seasonal_differences = NULL,
non_seasonal_ma = NULL,
seasonal_ar = NULL,
seasonal_differences = NULL,
seasonal_ma = NULL,
mtry = NULL,
trees = NULL,
min_n = NULL,
tree_depth = NULL,
learn_rate = NULL,
loss_reduction = NULL,
sample_size = NULL,
stop_iter = NULL
)

mode 
A single character string for the type of model. The only possible value for this model is "regression". 
seasonal_period 
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or timebased phrase of "2 weeks" can be used if a date or datetime variable is provided. See Fit Details below. 
non_seasonal_ar 
The order of the nonseasonal autoregressive (AR) terms. Often denoted "p" in pdqnotation. 
non_seasonal_differences 
The order of integration for nonseasonal differencing. Often denoted "d" in pdqnotation. 
non_seasonal_ma 
The order of the nonseasonal moving average (MA) terms. Often denoted "q" in pdqnotation. 
seasonal_ar 
The order of the seasonal autoregressive (SAR) terms. Often denoted "P" in PDQnotation. 
seasonal_differences 
The order of integration for seasonal differencing. Often denoted "D" in PDQnotation. 
seasonal_ma 
The order of the seasonal moving average (SMA) terms. Often denoted "Q" in PDQnotation. 
mtry 
A number for the number (or proportion) of predictors that will
be randomly sampled at each split when creating the tree models ( 
trees 
An integer for the number of trees contained in the ensemble. 
min_n 
An integer for the minimum number of data points in a node that is required for the node to be split further. 
tree_depth 
An integer for the maximum depth of the tree (i.e. number
of splits) ( 
learn_rate 
A number for the rate at which the boosting algorithm adapts
from iterationtoiteration ( 
loss_reduction 
A number for the reduction in the loss function required
to split further ( 
sample_size 
number for the number (or proportion) of data that is exposed to the fitting routine. 
stop_iter 
The number of iterations without improvement before
stopping ( 
The data given to the function are not saved and are only used
to determine the mode of the model. For arima_boost()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
"auto_arima_xgboost" (default)  Connects to forecast::auto.arima()
and
xgboost::xgb.train
"arima_xgboost"  Connects to forecast::Arima()
and
xgboost::xgb.train
Main Arguments
The main arguments (tuning parameters) for the ARIMA model are:
seasonal_period
: The periodic nature of the seasonality. Uses "auto" by default.
non_seasonal_ar
: The order of the nonseasonal autoregressive (AR) terms.
non_seasonal_differences
: The order of integration for nonseasonal differencing.
non_seasonal_ma
: The order of the nonseasonal moving average (MA) terms.
seasonal_ar
: The order of the seasonal autoregressive (SAR) terms.
seasonal_differences
: The order of integration for seasonal differencing.
seasonal_ma
: The order of the seasonal moving average (SMA) terms.
The main arguments (tuning parameters) for the model XGBoost model are:
mtry
: The number of predictors that will be
randomly sampled at each split when creating the tree models.
trees
: The number of trees contained in the ensemble.
min_n
: The minimum number of data points in a node
that are required for the node to be split further.
tree_depth
: The maximum depth of the tree (i.e. number of
splits).
learn_rate
: The rate at which the boosting algorithm adapts
from iterationtoiteration.
loss_reduction
: The reduction in the loss function required
to split further.
sample_size
: The amount of data exposed to the fitting routine.
stop_iter
: The number of iterations without improvement before
stopping.
These arguments are converted to their specific names at the time that the model is fit.
Other options and argument can be
set using set_engine()
(See Engine Details below).
If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
The standardized parameter names in modeltime
can be mapped to their original
names in each engine:
Model 1: ARIMA:
modeltime  forecast::auto.arima  forecast::Arima 
seasonal_period  ts(frequency)  ts(frequency) 
non_seasonal_ar, non_seasonal_differences, non_seasonal_ma  max.p(5), max.d(2), max.q(5)  order = c(p(0), d(0), q(0)) 
seasonal_ar, seasonal_differences, seasonal_ma  max.P(2), max.D(1), max.Q(2)  seasonal = c(P(0), D(0), Q(0)) 
Model 2: XGBoost:
modeltime  xgboost::xgb.train 
tree_depth  max_depth (6) 
trees  nrounds (15) 
learn_rate  eta (0.3) 
mtry  colsample_bynode (1) 
min_n  min_child_weight (1) 
loss_reduction  gamma (0) 
sample_size  subsample (1) 
stop_iter  early_stop 
Other options can be set using set_engine()
.
auto_arima_xgboost (default engine)
Model 1: Auto ARIMA (forecast::auto.arima
):
1 2 3 4 5 6 7 8 9  ## function (y, d = NA, D = NA, max.p = 5, max.q = 5, max.P = 2, max.Q = 2,
## max.order = 5, max.d = 2, max.D = 1, start.p = 2, start.q = 2, start.P = 1,
## start.Q = 1, stationary = FALSE, seasonal = TRUE, ic = c("aicc", "aic",
## "bic"), stepwise = TRUE, nmodels = 94, trace = FALSE, approximation = (length(x) >
## 150  frequency(x) > 12), method = NULL, truncate = NULL, xreg = NULL,
## test = c("kpss", "adf", "pp"), test.args = list(), seasonal.test = c("seas",
## "ocsb", "hegy", "ch"), seasonal.test.args = list(), allowdrift = TRUE,
## allowmean = TRUE, lambda = NULL, biasadj = FALSE, parallel = FALSE,
## num.cores = 2, x = y, ...)

Parameter Notes:
All values of nonseasonal pdq and seasonal PDQ are maximums.
The auto.arima
will select a value using these as an upper limit.
xreg
 This should not be used since XGBoost will be doing the regression
Model 2: XGBoost (xgboost::xgb.train
):
1 2 3 4  ## function (params = list(), data, nrounds, watchlist = list(), obj = NULL,
## feval = NULL, verbose = 1, print_every_n = 1L, early_stopping_rounds = NULL,
## maximize = NULL, save_period = NULL, save_name = "xgboost.model", xgb_model = NULL,
## callbacks = list(), ...)

Parameter Notes:
XGBoost uses a params = list()
to capture.
Parsnip / Modeltime automatically sends any args provided as ...
inside of set_engine()
to
the params = list(...)
.
Date and DateTime Variable
It's a requirement to have a date or datetime variable as a predictor.
The fit()
interface accepts date and datetime features and handles them internally.
fit(y ~ date)
Seasonal Period Specification
The period can be nonseasonal (seasonal_period = 1
) or seasonal (e.g. seasonal_period = 12
or seasonal_period = "12 months"
).
There are 3 ways to specify:
seasonal_period = "auto"
: A period is selected based on the periodicity of the data (e.g. 12 if monthly)
seasonal_period = 12
: A numeric frequency. For example, 12 is common for monthly data
seasonal_period = "1 year"
: A timebased phrase. For example, "1 year" would convert to 12 for monthly data.
Univariate (No xregs, Exogenous Regressors):
For univariate analysis, you must include a date or datetime feature. Simply use:
Formula Interface (recommended): fit(y ~ date)
will ignore xreg's.
XY Interface: fit_xy(x = data[,"date"], y = data$y)
will ignore xreg's.
Multivariate (xregs, Exogenous Regressors)
The xreg
parameter is populated using the fit()
or fit_xy()
function:
Only factor
, ordered factor
, and numeric
data will be used as xregs.
Date and Datetime variables are not used as xregs
character
data should be converted to factor.
Xreg Example: Suppose you have 3 features:
y
(target)
date
(time stamp),
month.lbl
(labeled month as a ordered factor).
The month.lbl
is an exogenous regressor that can be passed to the arima_boost()
using
fit()
:
fit(y ~ date + month.lbl)
will pass month.lbl
on as an exogenous regressor.
fit_xy(data[,c("date", "month.lbl")], y = data$y)
will pass x, where x is a data frame containing month.lbl
and the date
feature. Only month.lbl
will be used as an exogenous regressor.
Note that date or datetime class values are excluded from xreg
.
fit.model_spec()
, set_engine()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45  library(tidyverse)
library(lubridate)
library(parsnip)
library(rsample)
library(timetk)
library(modeltime)
# Data
m750 < m4_monthly %>% filter(id == "M750")
# Split Data 80/20
splits < initial_time_split(m750, prop = 0.9)
# MODEL SPEC 
# Set engine and boosting parameters
model_spec < arima_boost(
# ARIMA args
seasonal_period = 12,
non_seasonal_ar = 0,
non_seasonal_differences = 1,
non_seasonal_ma = 1,
seasonal_ar = 0,
seasonal_differences = 1,
seasonal_ma = 1,
# XGBoost Args
tree_depth = 6,
learn_rate = 0.1
) %>%
set_engine(engine = "arima_xgboost")
# FIT 
## Not run:
# Boosting  Happens by adding numeric date and month features
model_fit_boosted < model_spec %>%
fit(value ~ date + as.numeric(date) + month(date, label = TRUE),
data = training(splits))
model_fit_boosted
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.