README.md

cfsales

Travis build status Lifecycle: maturing

The goal of cfsales is to make the Corporación Favorita Grocery Sales Forecasting data easy to access. The intention is to have a realistic dataset for proving the value of various forecast methods, pipelines, data-cleaning, etc. in retail-like settings.

Due to the size of the data exceding the size limits set by CRAN, a data package was made and stored as a drat repository. It is neccesary that cfsalesdata1 and cfsalesdata2 are installed.

Installation

Due to the data size this repository could not be stores as a single github repository. Files had to be broken in to two data packages as drat repositories. The data package cfsalesdata1 and cfsalesdata2 that is available and easy to install through drat repositories on GitHub as shown below. The cfsalesdata1 package contains the majority of the data tables. If it is required that the full training set be loaded then the cfsalesdata2 packages will also be required.

You can install the required packages with the following command.

install.packages("cfsalesdata1",repos = "https://alexhallam.github.io/drat/", type = "source")
install.packages("cfsalesdata2",repos = "https://alexhallam.github.io/drat/", type = "source")

The cfsalesdata* packages provides the following data tables.

Once the cfsalesdata1 and cfsalesdata2 package has been installed and loaded data may be called with the data() function.

library(cfsalesdata1)
data("store_day_train") 
str(store_day_train)

Since these data sources are timeseries one may also be interested in converting a data source into a tsibble for functions that return information on time gaps, easy calculation of sliding window functions, and other usefull time series calculations.

> library(tsibble)
> tsibble::as_tsibble(x = store_day_test, index = date, key = store_nbr)

# A tsibble: 864 x 47 [1D]
# Key:       store_nbr [54]
# Groups:    @ date [16]
   date       store_nbr dcoilwtico is_navidad is_carnaval is_new_year is_black_firday is_cyber_monday is_futbol
   <date>         <int>      <dbl>      <dbl>       <dbl>       <dbl>           <dbl>           <dbl>     <dbl>
 1 2017-08-16         1       46.8          0           0           0               0               0         0
 2 2017-08-17         1       47.1          0           0           0               0               0         0
 3 2017-08-18         1       48.6          0           0           0               0               0         0
 4 2017-08-19         1       NA            0           0           0               0               0         0
 5 2017-08-20         1       NA            0           0           0               0               0         0
 6 2017-08-21         1       47.4          0           0           0               0               0         0
 7 2017-08-22         1       47.6          0           0           0               0               0         0
 8 2017-08-23         1       48.4          0           0           0               0               0         0
 9 2017-08-24         1       47.2          0           0           0               0               0         0
10 2017-08-25         1       47.6          0           0           0               0               0         0
# ........

To use the full training data use rbind() to join all of the training sets

library(cfsalesdata1)
library(cfsalesdata2)
library(lobstr)
train <- rbind(train1, train2, train3, train4)

# be aware of the size of this object
lobstr::obj_size(train)
# 4,015,907,776 B (aka 4 gigabytes)


alexhallam/cfsales documentation built on Oct. 1, 2019, 2:50 a.m.