Jiaqi (Jackie) Yin 09/15/2020
I have leveraged more features to the Amplify Package. Now we release
the updated version 0.2.0. The latest one update functions
modify_pred
, smooth_ts
. find_adj_date
, and select_model
. For the
first three functions, I have added a mode called weekday.sep = FALSE
.
If the weekday.sep = TRUE
, one of the seasonal pattern is weekly and
time series is implemented respectively according to their weekdays. In
select_model
, I have added cross validation option. See more details
below.
Our package is for double seasonal daily time seriers. Data features include outliers, variation increasing/decreasing, and double seasonality. We build multiplicative modeling for time series.
In this document, we explain how to use our R package.
To showing the steps, we forecast the some airline supporting tickets for FY20 as an example.
Install the package from github
devtools::install_github("placeboo/amplify", force = TRUE)
We start from loading our package Amplify
.
library(amplify)
We have support ticket from 2014-07-01 to 2018-06-10. A glimpse of the data:
In the package, we have a holiday data set. For forecasting, you can add your own holiday information. It should include the date and the name of the holidays. Have a look:
data("holiday")
head(holiday)
The whole data set is our training set. We aim to predict the daily tickets from 2019-07-01 to 2020-06-30. First we detect the outiers and impute them with reasonable value.
smooth.ls = smooth_ts(data = tickets,
vars = c("date", "ticket"),
par = list(lambda = 1.5, s = 7),
weekday.sep = TRUE)
train.dat = cbind(tickets, smooth.ls$ts.dat)
head(train.dat)
We split the data set into subtraining set and validation set to find parameters. There are 5 parameters ({\alpha, \beta, \gamma, \omega, \phi }) in the model. We use grid search to find a set of optimal parameters. The searching process has been optimized by parallel computing. We also implement cross validation.
train.start = 20140701
train.end = 20190630
subtrain.end = 20190430
valid.start = 20190501
valid.ls = train_test_split(train.dat,
train.window = c(train.start, subtrain.end),
test.window = c(valid.start, train.end))
subtrain.dat = valid.ls$train.dat
valid.dat = valid.ls$test.dat
tic("grid search")
select.ls = select_model(train.y = subtrain.dat$ts,
valid.y = valid.dat$ts,
grid.search = TRUE,
search.length = c(-0.5, 0.5),
length.out = 5,
CV = TRUE,
kfold = 4)
toc()
## grid search: 206.84 sec elapsed
The list select.ls
contains the model which has the smallest MAPE for
validation set, and the validation results.
cv = select.ls$cv
head(cv)
We pick the parameters which yield the lowest MAPE.
cv[which.min(cv$mape), ]
We first build a future data frame
pred.dat = build_date(20190701, 20200630)
head(pred.dat)
Within the data frame, we fill the predicted tickets.
model = select.ls$model
pred.ls = forecast(model, h = nrow(pred.dat))
pred = as.numeric(pred.ls$mean)
pred.dat = pred.dat %>%
mutate(pred = pred)
head(pred.dat)
We define the distance between the actual and predictiona and then adjust the holidays.
tmp = cbind(train.dat, pred = as.numeric(fitted(model)))
tmp = tmp %>%
mutate(resid = (ticket - pred) / pred)
In order to capture floating holidays such as Easter, we add more information about Holidays.
modify.ls = modify_pred(tmp,
vars = c("date", "resid"),
time.window = c(20190701, 20200630),
par = list(lambda = 1.5, n = 3),
holiday = holiday)
h.hat = modify.ls$h.mat$h.hat
h.hat[h.hat > 0] = 0
pred.dat = pred.dat %>%
mutate(h.hat = h.hat,
pred_modify = pred * (1 + h.hat))
pred.dat %>%
ggplot(aes(x = date, y = pred_modify)) +
geom_point(size = 2, alpha = 0.6) +
geom_line() +
scale_x_date(name = "Date",
date_breaks = "1 month") +
ylab("Tickets") +
theme(axis.text.x = element_text(angle = 30, hjust = 1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.