knitr::opts_chunk$set( message = FALSE, warning = FALSE, fig.width = 8, fig.height = 4.5, fig.align = 'center', out.width='95%', dpi = 100 ) # devtools::load_all() # Travis CI fails on load_all()
Clustering is an important part of time series analysis that allows us to organize time series into groups by combining "tsfeatures" (summary matricies) with unsupervised techniques such as K-Means Clustering. In this short tutorial, we will cover the tk_tsfeatures()
functions that computes a time series feature matrix of summarized information on one or more time series.
To get started, load the following libraries.
library(dplyr) library(purrr) library(timetk)
This tutorial will use the walmart_sales_weekly
dataset:
walmart_sales_weekly
Using the tk_tsfeatures()
function, we can quickly get the "tsfeatures" for each of the time series. A few important points:
The features
parameter come from the tsfeatures
R package. Use one of the function names from tsfeatures
R package e.g.("lumpiness", "stl_features").
We can supply any function that returns an aggregation (e.g. "mean" will apply the base::mean()
function).
You can supply custom functions by creating a function and providing it (e.g. my_mean()
defined below)
# Custom Function my_mean <- function(x, na.rm=TRUE) { mean(x, na.rm = na.rm) } tsfeature_tbl <- walmart_sales_weekly %>% group_by(id) %>% tk_tsfeatures( .date_var = Date, .value = Weekly_Sales, .period = 52, .features = c("frequency", "stl_features", "entropy", "acf_features", "my_mean"), .scale = TRUE, .prefix = "ts_" ) %>% ungroup() tsfeature_tbl
We can quickly add cluster assignments with the kmeans()
function and some tidyverse data wrangling.
set.seed(123) cluster_tbl <- tibble( cluster = tsfeature_tbl %>% select(-id) %>% as.matrix() %>% kmeans(centers = 3, nstart = 100) %>% pluck("cluster") ) %>% bind_cols( tsfeature_tbl ) cluster_tbl
Finally, we can visualize the cluster assignments by joining the cluster_tbl
with the original walmart_sales_weekly
and then plotting with plot_time_series()
.
cluster_tbl %>% select(cluster, id) %>% right_join(walmart_sales_weekly, by = "id") %>% group_by(id) %>% plot_time_series( Date, Weekly_Sales, .color_var = cluster, .facet_ncol = 2, .interactive = FALSE )
My Talk on High-Performance Time Series Forecasting
Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.
High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a "High-Performance Time Series Forecasting System" (HPTSF System).
I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:
Modeltime
- 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)GluonTS
(Competition Winners)Unlock the High-Performance Time Series Forecasting Course
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.