Time Series Machine Learning
In timetk: A Tool Kit for Working with Time Series

knitr::opts_chunk$set(
    message = FALSE,
    warning = FALSE,
    fig.width = 8, 
    fig.height = 4.5,
    fig.align = 'center',
    out.width='95%', 
    dpi = 100
)

# devtools::load_all() # Travis CI fails on load_all()

This vignette covers Machine Learning for Forecasting using the time-series signature, a collection calendar features derived from the timestamps in the time series.

Introduction

The time series signature is a collection of useful features that describe the time series index of a time-based data set. It contains a wealth of features that can be used to forecast time series that contain patterns.

In this vignette, the user will learn methods to implement machine learning to predict future outcomes in a time-based data set. The vignette example uses a well known time series dataset, the Bike Sharing Dataset, from the UCI Machine Learning Repository. The vignette follows an example where we'll use timetk to build a basic Machine Learning model to predict future values using the time series signature. The objective is to build a model and predict the next six months of Bike Sharing daily counts.

Prerequisites

Before we get started, load the following packages.

library(dplyr)
library(timetk)
library(recipes)
library(parsnip)
library(workflows)
library(rsample)
# Used to convert plots from interactive to static
interactive = FALSE

Data

We'll be using the Bike Sharing Dataset from the UCI Machine Learning Repository.

Source: Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg

# Read data
bike_transactions_tbl <- bike_sharing_daily %>%
  select(date = dteday, value = cnt)

bike_transactions_tbl

Next, visualize the dataset with the plot_time_series() function. Toggle .interactive = TRUE to get a plotly interactive plot. FALSE returns a ggplot2 static plot.

bike_transactions_tbl %>%
  plot_time_series(date, value, .interactive = interactive)

Train / Test

Next, use time_series_split() to make a train/test set.

Setting assess = "3 months" tells the function to use the last 3-months of data as the testing set.
Setting cumulative = TRUE tells the sampling to use all of the prior data as the training set.

splits <- bike_transactions_tbl %>%
  time_series_split(assess = "3 months", cumulative = TRUE)

Next, visualize the train/test split.

tk_time_series_cv_plan(): Converts the splits object to a data frame
plot_time_series_cv_plan(): Plots the time series sampling data using the "date" and "value" columns.

splits %>%
  tk_time_series_cv_plan() %>%
  plot_time_series_cv_plan(date, value, .interactive = interactive)

Modeling

Machine learning models are more complex than univariate models (e.g. ARIMA, Exponential Smoothing). This complexity typically requires a workflow (sometimes called a pipeline in other languages). The general process goes like this:

Create Preprocessing Recipe
Create Model Specifications
Use Workflow to combine Model Spec and Preprocessing, and Fit Model

Recipe Preprocessing Specification

The first step is to add the time series signature to the training set, which will be used this to learn the patterns. New in timetk 0.1.3 is integration with the recipes R package:

The recipes package allows us to add preprocessing steps that are applied sequentially as part of a data transformation pipeline.
The timetk has step_timeseries_signature(), which is used to add a number of features that can help machine learning models.

library(recipes)
# Add time series signature
recipe_spec_timeseries <- recipe(value ~ ., data = training(splits)) %>%
    step_timeseries_signature(date)

We can see what happens when we apply a prepared recipe prep() using the bake() function. Many new columns were added from the timestamp "date" feature. These are features we can use in our machine learning models.

bake(prep(recipe_spec_timeseries), new_data = training(splits))

Next, I apply various preprocessing steps to improve the modeling behavior. If you wish to learn more, I have an Advanced Time Series course that will help you learn these techniques.

recipe_spec_final <- recipe_spec_timeseries %>%
    step_fourier(date, period = 365, K = 5) %>%
    step_rm(date) %>%
    step_rm(contains("iso"), contains("minute"), contains("hour"),
            contains("am.pm"), contains("xts")) %>%
    step_normalize(contains("index.num"), date_year) %>%
    step_dummy(contains("lbl"), one_hot = TRUE) 

juice(prep(recipe_spec_final))

Model Specification

Next, let's create a model specification. We'll use a Elastic Net penalized regression via the glmnet package.

model_spec_lm <- linear_reg(
    mode = "regression", 
    penalty = 0.1
) %>%
    set_engine("glmnet")

Workflow

We can mary up the preprocessing recipe and the model using a workflow().

workflow_lm <- workflow() %>%
    add_recipe(recipe_spec_final) %>%
    add_model(model_spec_lm)

workflow_lm

Training

The workflow can be trained with the fit() function.

if (requireNamespace("glmnet")) {
  workflow_fit_lm <- workflow_lm %>% fit(data = training(splits))
}

Hyperparameter Tuning

Linear regression has no parameters. Therefore, this step is not needed. More complex models have hyperparameters that require tuning. Algorithms include:

Elastic Net
XGBoost
Random Forest
Support Vector Machine (SVM)
K-Nearest Neighbors
Multivariate Adaptive Regression Spines (MARS)

If you would like to learn how to tune these models for time series, then join the waitlist for my advanced Time Series Analysis & Forecasting Course.

Forecasting with Modeltime

The Modeltime Workflow is designed to speed up model evaluation and selection. Now that we have several time series models, let's analyze them and forecast the future with the modeltime package.

Modeltime Table

The Modeltime Table organizes the models with IDs and creates generic descriptions to help us keep track of our models. Let's add the models to a modeltime_table().

if (rlang::is_installed("modeltime")) {
  model_table <- modeltime::modeltime_table(
  workflow_fit_lm
) 

model_table
}

Calibration

Model Calibration is used to quantify error and estimate confidence intervals. We'll perform model calibration on the out-of-sample data (aka. the Testing Set) with the modeltime::modeltime_calibrate() function. Two new columns are generated (".type" and ".calibration_data"), the most important of which is the ".calibration_data". This includes the actual values, fitted values, and residuals for the testing set.

calibration_table <- model_table %>%
  modeltime::modeltime_calibrate(testing(splits))

calibration_table

Forecast (Testing Set)

With calibrated data, we can visualize the testing predictions (forecast).

Use modeltime::modeltime_forecast() to generate the forecast data for the testing set as a tibble.
Use modeltime::plot_modeltime_forecast() to visualize the results in interactive and static plot formats.

calibration_table %>%
  modeltime::modeltime_forecast(actual_data = bike_transactions_tbl) %>%
  modeltime::plot_modeltime_forecast(.interactive = interactive)

Accuracy (Testing Set)

Next, calculate the testing accuracy to compare the models.

Use modeltime::modeltime_accuracy() to generate the out-of-sample accuracy metrics as a tibble.
Use modeltime::table_modeltime_accuracy() to generate interactive and static

calibration_table %>%
  modeltime::modeltime_accuracy() %>%
  modeltime::table_modeltime_accuracy(.interactive = interactive)

Refit and Forecast Forward

Refitting is a best-practice before forecasting the future.

modeltime::modeltime_refit(): We re-train on full data (bike_transactions_tbl)
modeltime::modeltime_forecast(): For models that only depend on the "date" feature, we can use h (horizon) to forecast forward. Setting h = "12 months" forecasts then next 12-months of data.

calibration_table %>%
  modeltime::modeltime_refit(bike_transactions_tbl) %>%
  modeltime::modeltime_forecast(h = "12 months", actual_data = bike_transactions_tbl) %>%
  modeltime::plot_modeltime_forecast(.interactive = interactive)

Summary

Timetk is part of the amazing Modeltime Ecosystem for time series forecasting. But it can take a long time to learn:

Many algorithms
Ensembling and Resampling
Feature Engineering
Machine Learning
Deep Learning
Scalable Modeling: 10,000+ time series

Your probably thinking how am I ever going to learn time series forecasting. Here's the solution that will save you years of struggling.

Take the High-Performance Forecasting Course

Become the forecasting expert for your organization

High-Performance Time Series Course

Time Series is Changing

Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.

High-Performance Forecasting Systems will save companies by improving accuracy and scalability. Imagine what will happen to your career if you can provide your organization a "High-Performance Time Series Forecasting System" (HPTSF System).

How to Learn High-Performance Time Series Forecasting

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. You will learn:

Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
Deep Learning with GluonTS (Competition Winners)
Time Series Preprocessing, Noise Reduction, & Anomaly Detection
Feature engineering using lagged variables & external regressors
Hyperparameter Tuning
Time series cross-validation
Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
Scalable Forecasting - Forecast 1000+ time series in parallel
and more.

Become the Time Series Expert for your organization.

Take the High-Performance Time Series Forecasting Course

Any scripts or data that you put into this service are public.

timetk documentation built on Nov. 2, 2023, 6:18 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

timetk
A Tool Kit for Working with Time Series

Time Series Machine Learning
In timetk: A Tool Kit for Working with Time Series

Introduction

Prerequisites

Data

Train / Test

Modeling

Recipe Preprocessing Specification

Model Specification

Workflow

Training

Hyperparameter Tuning

Forecasting with Modeltime

Modeltime Table

Calibration

Forecast (Testing Set)

Accuracy (Testing Set)

Refit and Forecast Forward

Summary

Take the High-Performance Forecasting Course

Time Series is Changing

How to Learn High-Performance Time Series Forecasting

Try the timetk package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

timetk A Tool Kit for Working with Time Series

Time Series Machine Learning In timetk: A Tool Kit for Working with Time Series

Introduction

Prerequisites

Data

Train / Test

Modeling

Recipe Preprocessing Specification

Model Specification

Workflow

Training

Hyperparameter Tuning

Forecasting with Modeltime

Modeltime Table

Calibration

Forecast (Testing Set)

Accuracy (Testing Set)

Refit and Forecast Forward

Summary

Take the High-Performance Forecasting Course

Time Series is Changing

How to Learn High-Performance Time Series Forecasting

Try the timetk package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

timetk
A Tool Kit for Working with Time Series

Time Series Machine Learning
In timetk: A Tool Kit for Working with Time Series