The goal of cfsales is to make the Corporación Favorita Grocery Sales Forecasting data easy to access. The intention is to have a realistic dataset for proving the value of various forecast methods, pipelines, data-cleaning, etc. in retail-like settings.
Due to the size of the data exceding
the size limits set by CRAN, a data package was made and stored as a drat
repository. It is neccesary that cfsalesdata1
and cfsalesdata2
are installed.
Due to the data size this repository could not be stores as a single github repository. Files had to be broken in to two data packages as drat repositories. The data package cfsalesdata1
and cfsalesdata2
that is available and easy to install
through drat repositories on GitHub as shown below. The cfsalesdata1
package contains the majority of the data tables.
If it is required that the full training set be loaded then the cfsalesdata2
packages will also be required.
You can install the required packages with the following command.
install.packages("cfsalesdata1",repos = "https://alexhallam.github.io/drat/", type = "source")
install.packages("cfsalesdata2",repos = "https://alexhallam.github.io/drat/", type = "source")
The cfsalesdata*
packages provides the following data tables.
cfsalesdata1::holidays_events
: holidays and events, with metadata. cfsalesdata1::items
: item metadatacfsalesdata1::transactions
: count of sales transactions for each store_nbr/date
combination. Only included for the training data time frame. cfsalesdata1::oil
: daily oil pricecfsalesdata1::stores
: store metadatatrain*
: training set for the Kaggle competition. The target variable is
unit_sales
. The granularity is store-unit-day, where each store day has a
count of many various units. Note: The training set had to be split into 4
sets to aid in the file compression process. These sets are cfsalesdata1::train1
(45.7
MB),cfsalesdata1::train2
(36.4 MB), cfsalesdata2::train3
(42.7 MB),cfsalesdata2::train4
(28.9 MB). They may be bound
together with rbind()
.cfsalesdata1::test
: test set for the Kaggle competition. It should be noted that some
items ids are present in the test set which are not present in the training
set. It is expected that the forecaster should generate predictions of unit
sales for new items. cfsalesdata1::store_day_train
: one possible aggregation of raw data to simulate a data set
much like one would see in the wild. The target variable in this data table is
transactions. This is different from the target variable in the train*
data
table. The granularity is store-day. cfsalesdata1::store_day_test
: the test set for final validation of forecast methods.Once the cfsalesdata1
and cfsalesdata2
package has been installed and loaded data may be called
with the data()
function.
library(cfsalesdata1)
data("store_day_train")
str(store_day_train)
Since these data sources are timeseries one may also be interested in converting
a data source into a tsibble
for functions that return information on time
gaps, easy calculation of sliding window functions, and other usefull time
series calculations.
> library(tsibble)
> tsibble::as_tsibble(x = store_day_test, index = date, key = store_nbr)
# A tsibble: 864 x 47 [1D]
# Key: store_nbr [54]
# Groups: @ date [16]
date store_nbr dcoilwtico is_navidad is_carnaval is_new_year is_black_firday is_cyber_monday is_futbol
<date> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017-08-16 1 46.8 0 0 0 0 0 0
2 2017-08-17 1 47.1 0 0 0 0 0 0
3 2017-08-18 1 48.6 0 0 0 0 0 0
4 2017-08-19 1 NA 0 0 0 0 0 0
5 2017-08-20 1 NA 0 0 0 0 0 0
6 2017-08-21 1 47.4 0 0 0 0 0 0
7 2017-08-22 1 47.6 0 0 0 0 0 0
8 2017-08-23 1 48.4 0 0 0 0 0 0
9 2017-08-24 1 47.2 0 0 0 0 0 0
10 2017-08-25 1 47.6 0 0 0 0 0 0
# ........
To use the full training data use rbind()
to join all of the training sets
library(cfsalesdata1)
library(cfsalesdata2)
library(lobstr)
train <- rbind(train1, train2, train3, train4)
# be aware of the size of this object
lobstr::obj_size(train)
# 4,015,907,776 B (aka 4 gigabytes)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.