M5: M5 Competition data

Description Usage Format Note Source References See Also Examples

Description

The time series dataset from the M5 competition. Calendar information can be seen in calendar.

Usage

1

Format

M5 is a list of 42840 series, including 30490 bottom-level time series and 12350 aggregated time series.

1. Each series of bottom-level within M5 is a list object with the following structure:

id

The id of the product of the store. For example "HOBBIES_1_001_CA_1" denotes the product "HOBBIES_1_001" in the store "CA_1".

level

The level id of the series. The M5 dataset consists of 12 levels.

item.id

The id of the product. The dataset involves the unit sales of 3049 products.

dept.id

The id of the department the product belongs to. The products are classified to 7 product departments.

cat.id

The id of the category the product belongs to. Possible values are "HOBBIES", "FOODS", & "HOUSEHOLD".

store.id

The id of the store where the product is sold. The products are sold across 10 stores.

state.id

The State where the store is located. Possible values are "CA", "TX", & "WI".

n

The number of observations in the training time series.

h

The number of required forecasts.

x

A time series specifying the number of units sold at every day, starting from 2011-01-29 (the training data).

x.price

The price of the product for the given week/store in the training periods (from 2011-01-29). The price is provided per week (average across seven days). Note that NA means that the product was not sold during the examined week.

xx.price

The price of the product for the given week/store in the validation periods (from 2016-04-25). Note that NA means that the product was not sold during the examined week.

xxx.price

The price of the product for the given week/store in the testing periods (from 2016-05-23)Note that NA means that the product was not sold during the examined week.

2. Each series of the aggregated level within M5 does not include id and price elements. item.id, dept.id, cat.id, store.id, and state.id are included if the aggregated series contains these information. Besides, Each series of the aggregated level contains the following elements:

agg.by

A named character vector. It reflects which levels the series is aggregated on.

Note

The training data ranges from 2011-01-29 to 2016-04-24. Both validation and test datasets contain 28-day sales data. The prices are constant at weekly basis. Besides, the level information is introduced in M5 Competitors’ Guide.

Source

M5 Competition Dataset

References

M5 Competition Web

M5 Competitors’ Guide

See Also

[calendar()] for the M5 calendar information.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data(M5)
names(M5[[1]])
#extract bottom series
M5_bottom <- Filter(function(l) l$level == 12, M5)
#extract level 9 series
M5_l9 <- Filter(function(l) l$level == 9, M5)
#time series plot with date
library(ggplot2)
library(xts)
library(magrittr)
data(calendar)
ts <- M5[[1]]
xts(ts$x, order.by = calendar$date[1:ts$n]) %>% 
   autoplot() + ggtitle("Time series plot") + xlab("Time")

xqnwang/M5comp documentation built on April 2, 2020, 9:45 p.m.