create_lagged_df: Create model training and forecasting datasets with lagged,...

Description Usage Arguments Value Attributes Methods and related functions Examples

View source: R/lagged_df.R

Description

Create a list of datasets with lagged, grouped, dynamic, and static features to (a) train forecasting models for specified forecast horizons and (b) forecast into the future with a trained ML model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
create_lagged_df(
  data,
  type = c("train", "forecast"),
  method = c("direct", "multi_output"),
  outcome_col = 1,
  horizons,
  lookback = NULL,
  lookback_control = NULL,
  dates = NULL,
  frequency = NULL,
  dynamic_features = NULL,
  groups = NULL,
  static_features = NULL,
  predict_future = NULL,
  use_future = FALSE,
  keep_rows = FALSE
)

Arguments

data

A data.frame with the (a) target to be forecasted and (b) features/predictors. An optional date column can be given in the dates argument (required for grouped time series). Note that 'orecastML only works with regularly spaced date/time intervals and that missing rows–usually due to periods when no data was collected–will result in incorrect feature lags. Use fill_gaps to fill in any missing rows/data prior to running this function.

type

The type of dataset to return–(a) model training or (b) forecast prediction. The default is train.

method

The type of modeling dataset to create. direct returns 1 data.frame for each forecast horizon and multi_output returns 1 data.frame for simultaneously modeling all forecast horizons. The default is direct.

outcome_col

The column index–an integer–of the target to be forecasted. If outcome_col != 1, the outcome column will be moved to position 1 and outcome_col will be set to 1 internally.

horizons

A numeric vector of one or more forecast horizons, h, measured in dataset rows. If dates are given, a horizon of 1, for example, would equal 1 * frequency in calendar time.

lookback

A numeric vector giving the lags–in dataset rows–for creating the lagged features. All non-grouping, non-static, and non-dynamic features in the input dataset, data, are lagged by the same values. The outcome is also lagged by default. Either lookback or lookback_control need to be specified–but not both.

lookback_control

A list of numeric vectors, specifying potentially unique lags for each feature. The length of the list should equal ncol(data) and be ordered the same as the columns in data. Lag values for any grouping, static, or dynamic feature columns are automatically coerced to 0 and not lagged. list(NULL) lookback_control values drop columns from the input dataset. Either lookback or lookback_control need to be specified–but not both.

dates

A vector or 1-column data.frame of dates/times with class 'Date' or 'POSIXt'. The length of dates should equal nrow(data). Required if groups are given.

frequency

Date/time frequency. Required if dates are given. A string taking the same input as base::seq.Date(..., by = "frequency") or base::seq.POSIXt(..., by = "frequency") e.g., '1 hour', '1 month', '7 days', '10 years' etc. The highest frequency supported at present is '1 sec'.

dynamic_features

A character vector of column names that identify features that change through time but which are not lagged (e.g., weekday or year). If type = "forecast" and method = "direct", these features will receive NA values; though, they can be filled in by the user after running this function.

groups

A character vector of column names that identify the groups/hierarchies when multiple time series are present. These columns are used as model features but are not lagged. Note that combining feature lags with grouped time series will result in NA values throughout the data.

static_features

For grouped time series only. A character vector of column names that identify features that do not change through time. These columns are not lagged. If type = "forecast", these features will be filled forward using the most recent value for the group.

predict_future

When type = "forecast", a function for predicting the future values of any dynamic features. This function takes data and dates as positional arguments and returns a data.frame with (a) one or more rows, (b) an "index" column of future dates, (c) group columns if needed, and (d) 1 or more columns with name(s) in dynamic_features.

use_future

Boolean. If TRUE, the future.apply package is used for creating lagged data.frames. multisession or multicore futures are especially useful for (a) grouped time series with many groups and (b) high-dimensional datasets with many lags per feature. Run future::plan(future::multiprocess) prior to this function to set up multissession or multicore parallel dataset creation.

keep_rows

Boolean. For non-grouped time series, keep the 1:max(lookback) rows at the beginning of the time series. These rows will contain missing values for lagged features that "look back" before the start of the dataset.

Value

An S3 object of class 'lagged_df' or 'grouped_lagged_df': A list of data.frames with new columns for the lagged/non-lagged features. For method = "direct", the length of the returned list is equal to the number of forecast horizons and is in the order of horizons supplied to the horizons argument. Horizon-specific datasets can be accessed with my_lagged_df$horizon_h where 'h' gives the forecast horizon. For method = "multi_output", the length of the returned list is 1. Horizon-specific datasets can be accessed with my_lagged_df$horizon_1_3_5 where "1_3_5" represents the forecast horizons passed in horizons.

The contents of the returned data.frames are as follows:

type = 'train', non-grouped:

A data.frame with the outcome and lagged/dynamic features.

type = 'train', grouped:

A data.frame with the outcome and unlagged grouping columns followed by lagged, dynamic, and static features.

type = 'forecast', non-grouped:

(1) An 'index' column giving the row index or date of the forecast periods (e.g., a 100 row non-date-based training dataset would start with an index of 101). (2) A 'horizon' column that indicates the forecast period from 1:max(horizons). (3) Lagged features identical to the 'train', non-grouped dataset.

type = 'forecast', grouped:

(1) An 'index' column giving the date of the forecast periods. The first forecast date for each group is the maximum date from the dates argument + 1 * frequency which is the user-supplied date/time frequency.(2) A 'horizon' column that indicates the forecast period from 1:max(horizons). (3) Lagged, static, and dynamic features identical to the 'train', grouped dataset.

Attributes

Methods and related functions

The output of create_lagged_df() is passed into

and has the following generic S3 methods

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Sampled Seatbelts data from the R package datasets.
data("data_seatbelts", package = "forecastML")
#------------------------------------------------------------------------------
# Example 1 - Training data for 2 horizon-specific models w/ common lags per predictor.
horizons <- c(1, 12)
lookback <- 1:15

data <- data_seatbelts

data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
                               horizons = horizons, lookback = lookback)
head(data_train[[length(horizons)]])

# Example 1 - Forecasting dataset
# The last 'nrow(data_seatbelts) - horizon' rows are automatically used from data_seatbelts.
data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1,
                                  horizons = horizons, lookback = lookback)
head(data_forecast[[length(horizons)]])

#------------------------------------------------------------------------------
# Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor.
horizons <- 3
lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8))

data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
                               horizons = horizons, lookback_control = lookback)
head(data_train[[length(horizons)]])

Example output

Loading required package: dplyr

Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

   DriversKilled DriversKilled_lag_12 DriversKilled_lag_13 DriversKilled_lag_14
16           102                   87                  102                   97
17           103                  119                   87                  102
18           111                  106                  119                   87
19           120                  110                  106                  119
20           129                  106                  110                  106
21           122                  107                  106                  110
   DriversKilled_lag_15 kms_lag_12 kms_lag_13 kms_lag_14 kms_lag_15
16                  107      10955       9963       7685       9059
17                   97      11823      10955       9963       7685
18                  102      12391      11823      10955       9963
19                   87      13460      12391      11823      10955
20                  119      14055      13460      12391      11823
21                  106      12106      14055      13460      12391
   PetrolPrice_lag_12 PetrolPrice_lag_13 PetrolPrice_lag_14 PetrolPrice_lag_15
16          0.1008733          0.1020625          0.1023630          0.1029718
17          0.1010197          0.1008733          0.1020625          0.1023630
18          0.1005812          0.1010197          0.1008733          0.1020625
19          0.1037740          0.1005812          0.1010197          0.1008733
20          0.1040764          0.1037740          0.1005812          0.1010197
21          0.1037740          0.1040764          0.1037740          0.1005812
   law_lag_12 law_lag_13 law_lag_14 law_lag_15
16          0          0          0          0
17          0          0          0          0
18          0          0          0          0
19          0          0          0          0
20          0          0          0          0
21          0          0          0          0
  index horizon DriversKilled_lag_12 DriversKilled_lag_13 DriversKilled_lag_14
1   193       1                   92                  118                  122
2   194       2                   86                   92                  118
3   195       3                   81                   86                   92
4   196       4                   84                   81                   86
5   197       5                   87                   84                   81
6   198       6                   90                   87                   84
  DriversKilled_lag_15 kms_lag_12 kms_lag_13 kms_lag_14 kms_lag_15
1                  126      16224      16591      17504      19240
2                  122      16670      16224      16591      17504
3                  118      18539      16670      16224      16591
4                   92      19759      18539      16670      16224
5                   86      19584      19759      18539      16670
6                   81      19976      19584      19759      18539
  PetrolPrice_lag_12 PetrolPrice_lag_13 PetrolPrice_lag_14 PetrolPrice_lag_15
1          0.1177761          0.1177066          0.1180166          0.1184624
2          0.1147970          0.1177761          0.1177066          0.1180166
3          0.1157353          0.1147970          0.1177761          0.1177066
4          0.1153563          0.1157353          0.1147970          0.1177761
5          0.1148154          0.1153563          0.1157353          0.1147970
6          0.1147775          0.1148154          0.1153563          0.1157353
  law_lag_12 law_lag_13 law_lag_14 law_lag_15
1          1          1          1          1
2          1          1          1          1
3          1          1          1          1
4          1          1          1          1
5          1          1          1          1
6          1          1          1          1
   DriversKilled DriversKilled_lag_3 DriversKilled_lag_6 DriversKilled_lag_9
16           102                 125                 134                 110
17           103                 134                 147                 106
18           111                 110                 180                 107
19           120                 102                 125                 134
20           129                 103                 134                 147
21           122                 111                 110                 180
   DriversKilled_lag_12 kms_lag_4 kms_lag_5 kms_lag_6 kms_lag_7 kms_lag_8
16                   87      9267      9834     11372     12106     14055
17                  119      9130      9267      9834     11372     12106
18                  106      8933      9130      9267      9834     11372
19                  110     11000      8933      9130      9267      9834
20                  106     10733     11000      8933      9130      9267
21                  107     12912     10733     11000      8933      9130
   kms_lag_9 kms_lag_10 kms_lag_11 kms_lag_12 PetrolPrice_lag_6
16     13460      12391      11823      10955         0.1030264
17     14055      13460      12391      11823         0.1027301
18     12106      14055      13460      12391         0.1019972
19     11372      12106      14055      13460         0.1012746
20      9834      11372      12106      14055         0.1007040
21      9267       9834      11372      12106         0.1001396
   PetrolPrice_lag_7 PetrolPrice_lag_8 PetrolPrice_lag_9 PetrolPrice_lag_10
16         0.1037740         0.1040764         0.1037740          0.1005812
17         0.1030264         0.1037740         0.1040764          0.1037740
18         0.1027301         0.1030264         0.1037740          0.1040764
19         0.1019972         0.1027301         0.1030264          0.1037740
20         0.1012746         0.1019972         0.1027301          0.1030264
21         0.1007040         0.1012746         0.1019972          0.1027301
   PetrolPrice_lag_11 PetrolPrice_lag_12 PetrolPrice_lag_13 PetrolPrice_lag_14
16          0.1010197          0.1008733          0.1020625          0.1023630
17          0.1005812          0.1010197          0.1008733          0.1020625
18          0.1037740          0.1005812          0.1010197          0.1008733
19          0.1040764          0.1037740          0.1005812          0.1010197
20          0.1037740          0.1040764          0.1037740          0.1005812
21          0.1030264          0.1037740          0.1040764          0.1037740
   PetrolPrice_lag_15 law_lag_8
16          0.1029718         0
17          0.1023630         0
18          0.1020625         0
19          0.1008733         0
20          0.1010197         0
21          0.1005812         0

forecastML documentation built on July 8, 2020, 7:27 p.m.