I have leveraged more features to the Amplify Package. Now we release the updated version 0.2.0. The latest one update functions modify_pred
, smooth_ts
. find_adj_date
, and select_model
. For the first three functions, I have added a mode called weekday.sep = FALSE
. If the weekday.sep = TRUE
, one of the seasonal pattern is weekly and time series is implemented respectively according to their weekdays. In select_model
, I have added cross validation option. See more details below.
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 12, fig.height = 6, cache = TRUE)
library(tidyverse) library(forecast) library(lubridate) library(tictoc)
theme_set(theme_bw() + theme(axis.text.x = element_text(color = "grey20", size = 18, face = "plain"), axis.text.y = element_text(color = "grey20", size = 18, face = "plain"), axis.title.x = element_text(color = "grey20", size =20, face = "bold"), axis.title.y = element_text(color = "grey20", size = 20, face = "bold"), legend.text = element_text(size = 18), legend.title = element_text(size = 20, face = "bold"), strip.text.x = element_text(size=18, face="bold")))
Our package is for double seasonal daily time seriers. Data features include outliers, variation increasing/decreasing, and double seasonality. We build multiplicative modeling for time series.
In this document, we explain how to use our R package.
To showing the steps, we forecast the some airline supporting tickets for FY20 as an example.
Install the package from github
devtools::install_github("placeboo/amplify", force = TRUE)
We start from loading our package Amplify
.
library(amplify)
We have support ticket from 2014-07-01 to 2018-06-10. A glimpse of the data:
load("data/ticket.RData") tickets %>% filter(date < ymd(20140710))
In the package, we have a holiday data set. For forecasting, you can add your own holiday information. It should include the date and the name of the holidays. Have a look:
data("holiday") head(holiday)
tickets = tickets %>% filter(date <= ymd(20190630))
The whole data set is our training set. We aim to predict the daily tickets from 2019-07-01 to 2020-06-30. First we detect the outiers and impute them with reasonable value.
smooth.ls = smooth_ts(data = tickets, vars = c("date", "ticket"), par = list(lambda = 1.5, s = 7), weekday.sep = TRUE) train.dat = cbind(tickets, smooth.ls$ts.dat) head(train.dat)
We split the data set into subtraining set and validation set to find parameters. There are 5 parameters ${\alpha, \beta, \gamma, \omega, \phi }$ in the model. We use grid search to find a set of optimal parameters. The searching process has been optimized by parallel computing. We also implement cross validation.
train.start = 20140701 train.end = 20190630 subtrain.end = 20190430 valid.start = 20190501 valid.ls = train_test_split(train.dat, train.window = c(train.start, subtrain.end), test.window = c(valid.start, train.end)) subtrain.dat = valid.ls$train.dat valid.dat = valid.ls$test.dat
tic("grid search") select.ls = select_model(train.y = subtrain.dat$ts, valid.y = valid.dat$ts, grid.search = TRUE, search.length = c(-0.5, 0.5), length.out = 5, CV = TRUE, kfold = 4) toc()
The list select.ls
contains the model which has the smallest MAPE for validation set, and the validation results.
cv = select.ls$cv head(cv)
We pick the parameters which yield the lowest MAPE.
cv[which.min(cv$mape), ]
We first build a future data frame
pred.dat = build_date(20190701, 20200630) head(pred.dat)
Within the data frame, we fill the predicted tickets.
model = select.ls$model pred.ls = forecast(model, h = nrow(pred.dat)) pred = as.numeric(pred.ls$mean) pred.dat = pred.dat %>% mutate(pred = pred) head(pred.dat)
We define the distance between the actual and predictiona and then adjust the holidays.
tmp = cbind(train.dat, pred = as.numeric(fitted(model))) tmp = tmp %>% mutate(resid = (ticket - pred) / pred)
In order to capture floating holidays such as Easter, we add more information about Holidays.
modify.ls = modify_pred(tmp, vars = c("date", "resid"), time.window = c(20190701, 20200630), par = list(lambda = 1.5, n = 3), holiday = holiday) h.hat = modify.ls$h.mat$h.hat h.hat[h.hat > 0] = 0 pred.dat = pred.dat %>% mutate(h.hat = h.hat, pred_modify = pred * (1 + h.hat))
pred.dat %>% ggplot(aes(x = date, y = pred_modify)) + geom_point(size = 2, alpha = 0.6) + geom_line() + scale_x_date(name = "Date", date_breaks = "1 month") + ylab("Tickets") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.