time_series_split: Simple Training/Test Set Splitting for Time Series

Description Usage Arguments Details Value See Also Examples

View source: R/rsample-time_series_split.R

Description

time_series_split creates resample splits using time_series_cv() but returns only a single split. This is useful when creating a single train/test split.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
time_series_split(
  data,
  date_var = NULL,
  initial = 5,
  assess = 1,
  skip = 1,
  lag = 0,
  cumulative = FALSE,
  slice = 1,
  ...
)

Arguments

data

A data frame.

date_var

A date or date-time variable.

initial

The number of samples used for analysis/modeling in the initial resample.

assess

The number of samples used for each assessment resample.

skip

A integer indicating how many (if any) additional resamples to skip to thin the total amount of data points in the analysis resample. See the example below.

lag

A value to include an lag between the assessment and analysis set. This is useful if lagged predictors will be used during training and testing.

cumulative

A logical. Should the analysis resample grow beyond the size specified by initial at each resample?.

slice

Returns a single slice from time_series_cv

...

Not currently used.

Details

Time-Based Specification

The initial, assess, skip, and lag variables can be specified as:

Initial (Training Set) and Assess (Testing Set)

The main options, initial and assess, control the number of data points from the original data that are in the analysis (training set) and the assessment (testing set), respectively.

Skip

skip enables the function to not use every data point in the resamples. When skip = 1, the resampling data sets will increment by one position.

Example: Suppose that the rows of a data set are consecutive days. Using skip = 7 will make the analysis data set operate on weeks instead of days. The assessment set size is not affected by this option.

Lag

The Lag parameter creates an overlap between the Testing set. This is needed when lagged predictors are used.

Cumulative vs Sliding Window

When cumulative = TRUE, the initial parameter is ignored and the analysis (training) set will grow as resampling continues while the assessment (testing) set size will always remain static.

When cumulative = FALSE, the initial parameter fixes the analysis (training) set and resampling occurs over a fixed window.

Slice

This controls which slice is returned. If slice = 1, only the most recent slice will be returned.

Value

An rsplit object that can be used with the training and testing functions to extract the data in each split.

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
library(tidyverse)
library(timetk)

# DATA ----
m750 <- m4_monthly %>% filter(id == "M750")

# Get the most recent 3 years as testing, and previous 10 years as training
m750 %>%
    time_series_split(initial = "10 years", assess = "3 years")

# Skip the most recent 3 years
m750 %>%
    time_series_split(
        initial = "10 years",
        assess  = "3 years",
        skip    = "3 years",
        slice   = 2          # <- Returns 2nd slice, 3-years back
    )

# Add 1 year lag for testing overlap
m750 %>%
    time_series_split(
        initial = "10 years",
        assess  = "3 years",
        skip    = "3 years",
        slice   = 2,
        lag     = "1 year"   # <- Overlaps training/testing by 1 year
    )

timetk documentation built on Nov. 16, 2021, 9:26 a.m.