boost_prophet: General Interface for Boosted PROPHET Time Series Models

View source: R/parsnip-prophet_boost.R

boost_prophetR Documentation

General Interface for Boosted PROPHET Time Series Models

Description

boost_prophet() is a way to generate a specification of a Boosted PROPHET model before fitting and allows the model to be created using different packages. Currently the only package is prophet.

Usage

boost_prophet(
  mode = "regression",
  growth = NULL,
  changepoint_num = NULL,
  changepoint_range = NULL,
  seasonality_yearly = NULL,
  seasonality_weekly = NULL,
  seasonality_daily = NULL,
  season = NULL,
  prior_scale_changepoints = NULL,
  prior_scale_seasonality = NULL,
  prior_scale_holidays = NULL,
  logistic_cap = NULL,
  logistic_floor = NULL,
  tree_depth = NULL,
  learn_rate = NULL,
  mtry = NULL,
  trees = NULL,
  min_n = NULL,
  sample_size = NULL,
  loss_reduction = NULL
)

Arguments

mode

A single character string for the type of model. The only possible value for this model is "regression".

growth

String 'linear' or 'logistic' to specify a linear or logistic trend.

changepoint_num

Number of potential changepoints to include for modeling trend.

changepoint_range

Adjusts the flexibility of the trend component by limiting to a percentage of data before the end of the time series. 0.80 means that a changepoint cannot exist after the first 80% of the data.

seasonality_yearly

One of "auto", TRUE or FALSE. Toggles on/off a seasonal component that models year-over-year seasonality.

seasonality_weekly

One of "auto", TRUE or FALSE. Toggles on/off a seasonal component that models week-over-week seasonality.

seasonality_daily

One of "auto", TRUE or FALSE. Toggles on/off a seasonal componet that models day-over-day seasonality.

season

'additive' (default) or 'multiplicative'.

prior_scale_changepoints

Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints.

prior_scale_seasonality

Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality.

prior_scale_holidays

Parameter modulating the strength of the holiday components model, unless overridden in the holidays input.

logistic_cap

When growth is logistic, the upper-bound for "saturation".

logistic_floor

When growth is logistic, the lower-bound for "saturation".

tree_depth

The maximum depth of the tree (i.e. number of splits).

learn_rate

The rate at which the boosting algorithm adapts from iteration-to-iteration.

mtry

The number of predictors that will be randomly sampled at each split when creating the tree models.

trees

The number of trees contained in the ensemble.

min_n

The minimum number of data points in a node that is required for the node to be split further.

sample_size

The amount of data exposed to the fitting routine.

loss_reduction

The reduction in the loss function required to split further.

Details

The data given to the function are not saved and are only used to determine the mode of the model. For boost_prophet(), the mode will always be "regression".

The model can be created using the fit() function using the following engines:

  • "prophet_catboost" (default) - Connects to prophet::prophet() and catboost::catboost.train()

  • "prophet_lightgbm" - Connects to prophet::prophet() and lightgbm::lgb.train()

Main Arguments

The main arguments (tuning parameters) for the PROPHET model are:

  • growth: String 'linear' or 'logistic' to specify a linear or logistic trend.

  • changepoint_num: Number of potential changepoints to include for modeling trend.

  • changepoint_range: Range changepoints that adjusts how close to the end the last changepoint can be located.

  • season: 'additive' (default) or 'multiplicative'.

  • prior_scale_changepoints: Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints.

  • prior_scale_seasonality: Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality.

  • prior_scale_holidays: Parameter modulating the strength of the holiday components model, unless overridden in the holidays input.

The main arguments (tuning parameters) for the model Catboost/LightGBM model are:

  • tree_depth: The maximum depth of the tree (i.e. number of splits).

  • learn_rate: The rate at which the boosting algorithm adapts from iteration-to-iteration.

  • mtry: The number of predictors that will be randomly sampled at each split when creating the tree models.

  • trees: The number of trees contained in the ensemble.

  • min_n: The minimum number of data points in a node that is required for the node to be split further.

  • sample_size: The amount of data exposed to the fitting routine.

  • loss_reduction: The reduction in the loss function required to split further.

These arguments are converted to their specific names at the time that the model is fit.

Other options and argument can be set using set_engine() (See Engine Details below).

If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

Engine Details

The standardized parameter names in boostime can be mapped to their original names in each engine:

Model 1: PROPHET:

boostime prophet
growth growth ('linear')
changepoint_num n.changepoints (25)
changepoint_range changepoints.range (0.8)
seasonality_yearly yearly.seasonality ('auto')
seasonality_weekly weekly.seasonality ('auto')
seasonality_daily daily.seasonality ('auto')
season seasonality.mode ('additive')
prior_scale_changepoints changepoint.prior.scale (0.05)
prior_scale_seasonality seasonality.prior.scale (10)
prior_scale_holidays holidays.prior.scale (10)
logistic_cap df$cap (NULL)
logistic_floor df$floor (NULL)

Model 2: Catboost / LightGBM:

boostime catboost::catboost.train lightgbm::lgb.train
tree_depth depth max_depth
learn_rate learning_rate learning_rate
mtry rsm feature_fraction
trees iterations num_iterations
min_n min_data_in_leaf min_data_in_leaf
loss_reduction None min_gain_to_split
sample_size subsample bagging_fraction

Other options can be set using set_engine().

prophet_catboost

Model 1: PROPHET (prophet::prophet):

## function (df = NULL, growth = "linear", changepoints = NULL, n.changepoints = 25, 
##     changepoint.range = 0.8, yearly.seasonality = "auto", weekly.seasonality = "auto", 
##     daily.seasonality = "auto", holidays = NULL, seasonality.mode = "additive", 
##     seasonality.prior.scale = 10, holidays.prior.scale = 10, changepoint.prior.scale = 0.05, 
##     mcmc.samples = 0, interval.width = 0.8, uncertainty.samples = 1000, 
##     fit = TRUE, ...)

Parameter Notes:

  • df: This is supplied via the parsnip / boostime fit() interface (so don't provide this manually). See Fit Details (below).

  • holidays: A data.frame of holidays can be supplied via set_engine()

  • uncertainty.samples: The default is set to 0 because the prophet uncertainty intervals are not used as part of the Modeltime Workflow. You can override this setting if you plan to use prophet's uncertainty tools.

Logistic Growth and Saturation Levels:

  • For growth = "logistic", simply add numeric values for logistic_cap and / or logistic_floor. There is no need to add additional columns for "cap" and "floor" to your data frame.

Limitations:

  • prophet::add_seasonality() is not currently implemented. It's used to specify non-standard seasonalities using fourier series. An alternative is to use step_fourier() and supply custom seasonalities as Extra Regressors.

Model 2: Catboost (catboost::catboost.train):

## function (learn_pool, test_pool = NULL, params = list())

Parameter Notes:

  • Catboost uses a params = list() to capture. Parsnip / Timeboost automatically sends any args provided as ... inside of set_engine() to the params = list(...).

prophet_lightgbm

Model 1: PROPHET (prophet::prophet):

## function (df = NULL, growth = "linear", changepoints = NULL, n.changepoints = 25, 
##     changepoint.range = 0.8, yearly.seasonality = "auto", weekly.seasonality = "auto", 
##     daily.seasonality = "auto", holidays = NULL, seasonality.mode = "additive", 
##     seasonality.prior.scale = 10, holidays.prior.scale = 10, changepoint.prior.scale = 0.05, 
##     mcmc.samples = 0, interval.width = 0.8, uncertainty.samples = 1000, 
##     fit = TRUE, ...)

Parameter Notes:

  • df: This is supplied via the parsnip / boostime fit() interface (so don't provide this manually). See Fit Details (below).

  • holidays: A data.frame of holidays can be supplied via set_engine()

  • uncertainty.samples: The default is set to 0 because the prophet uncertainty intervals are not used as part of the Modeltime Workflow. You can override this setting if you plan to use prophet's uncertainty tools.

Logistic Growth and Saturation Levels:

  • For growth = "logistic", simply add numeric values for logistic_cap and / or logistic_floor. There is no need to add additional columns for "cap" and "floor" to your data frame.

Limitations:

  • prophet::add_seasonality() is not currently implemented. It's used to specify non-standard seasonalities using fourier series. An alternative is to use step_fourier() and supply custom seasonalities as Extra Regressors.

Model 2: Lightgbm (catboost::catboost.train):

## function (params = list(), data, nrounds = 10L, valids = list(), obj = NULL, 
##     eval = NULL, verbose = 1L, record = TRUE, eval_freq = 1L, init_model = NULL, 
##     colnames = NULL, categorical_feature = NULL, early_stopping_rounds = NULL, 
##     callbacks = list(), reset_data = FALSE, ...)

Parameter Notes:

  • Lightgbm uses a params = list() to capture. Parsnip / Timeboost automatically sends any args provided as ... inside of set_engine() to the params = list(...).

Fit Details

Date and Date-Time Variable

It's a requirement to have a date or date-time variable as a predictor. The fit() interface accepts date and date-time features and handles them internally.

  • fit(y ~ date)

Univariate (No Extra Regressors):

For univariate analysis, you must include a date or date-time feature. Simply use:

  • Formula Interface (recommended): fit(y ~ date) will ignore xreg's.

Multivariate (Extra Regressors)

Extra Regressors parameter is populated using the fit() or fit_xy() function:

  • Only factor, ordered factor, and numeric data will be used as xregs.

  • Date and Date-time variables are not used as xregs

  • character data should be converted to factor.

Xreg Example: Suppose you have 3 features:

  1. y (target)

  2. date (time stamp),

  3. month.lbl (labeled month as a ordered factor).

The month.lbl is an exogenous regressor that can be passed to the arima_reg() using fit():

  • fit(y ~ date + month.lbl) will pass month.lbl on as an exogenous regressor.

Note that date or date-time class values are excluded from xreg.

See Also

fit.model_spec(), set_engine()

Examples

library(dplyr)
library(lubridate)
library(parsnip)
library(rsample)
library(timetk)
library(boostime)

# Data
m750 <- m4_monthly %>% filter(id == "M750")
m750

# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.8)

# ---- PROPHET ----

# Model Spec
model_spec <- boost_prophet(
    learn_rate = 0.1
) %>%
    set_engine("prophet_catboost")

# Fit Spec
model_fit <- model_spec %>%
    fit(log(value) ~ date + as.numeric(date) + month(date, label = TRUE),
        data = training(splits))
model_fit


AlbertoAlmuinha/boostime documentation built on Aug. 13, 2022, 1:46 p.m.