knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
M5 Competition Data
You can install the package M5comp
with:
devtools::install_github("xqnwang/M5comp")
Number of M5 series per aggregation level.
| Level id | Aggregation level | Number of series | |:--------:|:---------------------------------------------------------------------|-----------------:| | 1 | Unit sales of all products, aggregated for all stores/states | 1 | | 2 | Unit sales of all products, aggregated for each State | 3 | | 3 | Unit sales of all products, aggregated for each store | 10 | | 4 | Unit sales of all products, aggregated for each category | 3 | | 5 | Unit sales of all products, aggregated for each department | 7 | | 6 | Unit sales of all products, aggregated for each State and category | 9 | | 7 | Unit sales of all products, aggregated for each State and department | 21 | | 8 | Unit sales of all products, aggregated for each store and category | 30 | | 9 | Unit sales of all products, aggregated for each store and department | 70 | | 10 | Unit sales of product x, aggregated for all stores/states | 3049 | | 11 | Unit sales of product x, aggregated for each State | 9147 | | 12 | Unit sales of product x, aggregated for each store | 30490 | | Total| - | 42840 |
The following table shows more simply and clearly how the aggregated time series are generated. For example, series of level 11 are aggregated by series with the same item.id
and state.id
.
| Level id | id | item.id | dept.id | cat.id | store.id | state.id | |:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| |(example) | (HOBBIES_1_001_CA_1) | (HOBBIES_1_001) | (HOBBIES_1) | (HOBBIES) | (CA_1) | (CA) | |1| | | | | | | |2| | | | | |$\checkmark$| |3| | | | |$\checkmark$| | |4| | | |$\checkmark$| | | |5| | |$\checkmark$| | | | |6| | | |$\checkmark$| |$\checkmark$| |7| | |$\checkmark$ | | |$\checkmark$ | |8| | | |$\checkmark$ |$\checkmark$ | | |9| | |$\checkmark$| |$\checkmark$| | |10| |$\checkmark$| | | | | |11| |$\checkmark$| | | |$\checkmark$| |12|$\checkmark$ | | | | | |
The package M5comp
contains two datasets: M5
and calendar
.
The M5
is a list of $42840$ series, including $30490$ bottom-level time series and $12350$ aggregated time series. Each element of this list is also a list with some components, like the time series of the training periods and the level id of the series. The components of the bottom-level series and the aggregated series are partially different.
# extract bottom-level series (level 12) library(M5comp) data(M5) M5_bottom <- Filter(function(l) l$level == 12, M5) # the structure of the bottom-level series str(M5_bottom[[1]]) # extract extract level 9 series M5_l9 <- Filter(function(l) l$level == 9, M5) # the structure of the aggregated series str(M5_l9[[1]]) # which levels the series is aggregated on M5_l9[[1]]$agg.by
The calendar data contains information about the dates the products are sold. The calendar
object is a dataframe with some components, such as date, weekday and event_name_1.
# the structure of the calendar data data(calendar) str(calendar)
# plot one of the series with date library(ggplot2) library(xts) library(magrittr) xts(M5[[1]]$x, order.by = calendar$date[1:M5[[1]]$n]) %>% autoplot() + xlab("Time") + ylab("Sales")
# help pages ?M5 ?calendar
This package is free and open source software, licensed under GPL-3
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.