Description Usage Arguments Value Attributes Methods and related functions Examples
Create a list of datasets with lagged, grouped, dynamic, and static features to (a) train forecasting models for specified forecast horizons and (b) forecast into the future with a trained ML model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | create_lagged_df(
data,
type = c("train", "forecast"),
method = c("direct", "multi_output"),
outcome_col = 1,
horizons,
lookback = NULL,
lookback_control = NULL,
dates = NULL,
frequency = NULL,
dynamic_features = NULL,
groups = NULL,
static_features = NULL,
predict_future = NULL,
use_future = FALSE,
keep_rows = FALSE
)
|
data |
A data.frame with the (a) target to be forecasted and (b) features/predictors. An optional date column can be given in the
|
type |
The type of dataset to return–(a) model training or (b) forecast prediction. The default is |
method |
The type of modeling dataset to create. |
outcome_col |
The column index–an integer–of the target to be forecasted. If |
horizons |
A numeric vector of one or more forecast horizons, h, measured in dataset rows.
If |
lookback |
A numeric vector giving the lags–in dataset rows–for creating the lagged features. All non-grouping,
non-static, and non-dynamic features in the input dataset, |
lookback_control |
A list of numeric vectors, specifying potentially unique lags for each feature. The length
of the list should equal |
dates |
A vector or 1-column data.frame of dates/times with class 'Date' or 'POSIXt'. The length
of |
frequency |
Date/time frequency. Required if |
dynamic_features |
A character vector of column names that identify features that change through time but which are not lagged (e.g., weekday or year).
If |
groups |
A character vector of column names that identify the groups/hierarchies when multiple time series are present. These columns are used as model features but
are not lagged. Note that combining feature lags with grouped time series will result in |
static_features |
For grouped time series only. A character vector of column names that identify features that do not change through time.
These columns are not lagged. If |
predict_future |
When |
use_future |
Boolean. If |
keep_rows |
Boolean. For non-grouped time series, keep the |
An S3 object of class 'lagged_df' or 'grouped_lagged_df': A list of data.frames with new columns for the lagged/non-lagged features.
For method = "direct"
, the length of the returned list is equal to the number of forecast horizons and is in the order of
horizons supplied to the horizons
argument. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_h
where 'h' gives the forecast horizon.
For method = "multi_output"
, the length of the returned list is 1. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_1_3_5
where "1_3_5" represents the forecast horizons passed in horizons
.
The contents of the returned data.frames are as follows:
A data.frame with the outcome and lagged/dynamic features.
A data.frame with the outcome and unlagged grouping columns followed by lagged, dynamic, and static features.
(1) An 'index' column giving the row index or date of the
forecast periods (e.g., a 100 row non-date-based training dataset would start with an index of 101). (2) A 'horizon' column
that indicates the forecast period from 1:max(horizons)
. (3) Lagged features identical to the
'train', non-grouped dataset.
(1) An 'index' column giving the date of the
forecast periods. The first forecast date for each group is the maximum date from the dates
argument
+ 1 * frequency
which is the user-supplied date/time frequency.(2) A 'horizon' column that indicates
the forecast period from 1:max(horizons)
. (3) Lagged, static, and dynamic features identical to the 'train', grouped dataset.
names
: The horizon-specific datasets that can be accessed with my_lagged_df$horizon_h
.
type
: Training, train
, or forecasting, forecast
, dataset(s).
method
: direct
or multi_output
.
horizons
: Forecast horizons measured in dataset rows.
outcome_col
: The column index of the target being forecasted.
outcome_cols
: If method = multi_output
, the column indices of the multiple outputs in the transformed dataset.
outcome_name
: The name of the target being forecasted.
outcome_names
: If method = multi_output
, the column names of the multiple outputs in the transformed dataset.
The names take the form "outcome_name_h" where 'h' is a horizon passed in horizons
.
predictor_names
: The predictor or feature names from the input dataset.
row_indices
: The row.names()
of the output dataset. For non-grouped datasets, the first
lookback
+ 1 rows are removed from the beginning of the dataset to remove NA
values in the lagged features.
date_indices
: If dates
are given, the vector of dates
.
frequency
: If dates
are given, the date/time frequency.
data_start
: min(row_indices)
or min(date_indices)
.
data_stop
: max(row_indices)
or max(date_indices)
.
groups
: If groups
are given, a vector of group names.
class
: grouped_lagged_df, lagged_df, list
The output of create_lagged_df()
is passed into
create_windows
and has the following generic S3 methods
summary
plot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # Sampled Seatbelts data from the R package datasets.
data("data_seatbelts", package = "forecastML")
#------------------------------------------------------------------------------
# Example 1 - Training data for 2 horizon-specific models w/ common lags per predictor.
horizons <- c(1, 12)
lookback <- 1:15
data <- data_seatbelts
data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
horizons = horizons, lookback = lookback)
head(data_train[[length(horizons)]])
# Example 1 - Forecasting dataset
# The last 'nrow(data_seatbelts) - horizon' rows are automatically used from data_seatbelts.
data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1,
horizons = horizons, lookback = lookback)
head(data_forecast[[length(horizons)]])
#------------------------------------------------------------------------------
# Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor.
horizons <- 3
lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8))
data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
horizons = horizons, lookback_control = lookback)
head(data_train[[length(horizons)]])
|
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
DriversKilled DriversKilled_lag_12 DriversKilled_lag_13 DriversKilled_lag_14
16 102 87 102 97
17 103 119 87 102
18 111 106 119 87
19 120 110 106 119
20 129 106 110 106
21 122 107 106 110
DriversKilled_lag_15 kms_lag_12 kms_lag_13 kms_lag_14 kms_lag_15
16 107 10955 9963 7685 9059
17 97 11823 10955 9963 7685
18 102 12391 11823 10955 9963
19 87 13460 12391 11823 10955
20 119 14055 13460 12391 11823
21 106 12106 14055 13460 12391
PetrolPrice_lag_12 PetrolPrice_lag_13 PetrolPrice_lag_14 PetrolPrice_lag_15
16 0.1008733 0.1020625 0.1023630 0.1029718
17 0.1010197 0.1008733 0.1020625 0.1023630
18 0.1005812 0.1010197 0.1008733 0.1020625
19 0.1037740 0.1005812 0.1010197 0.1008733
20 0.1040764 0.1037740 0.1005812 0.1010197
21 0.1037740 0.1040764 0.1037740 0.1005812
law_lag_12 law_lag_13 law_lag_14 law_lag_15
16 0 0 0 0
17 0 0 0 0
18 0 0 0 0
19 0 0 0 0
20 0 0 0 0
21 0 0 0 0
index horizon DriversKilled_lag_12 DriversKilled_lag_13 DriversKilled_lag_14
1 193 1 92 118 122
2 194 2 86 92 118
3 195 3 81 86 92
4 196 4 84 81 86
5 197 5 87 84 81
6 198 6 90 87 84
DriversKilled_lag_15 kms_lag_12 kms_lag_13 kms_lag_14 kms_lag_15
1 126 16224 16591 17504 19240
2 122 16670 16224 16591 17504
3 118 18539 16670 16224 16591
4 92 19759 18539 16670 16224
5 86 19584 19759 18539 16670
6 81 19976 19584 19759 18539
PetrolPrice_lag_12 PetrolPrice_lag_13 PetrolPrice_lag_14 PetrolPrice_lag_15
1 0.1177761 0.1177066 0.1180166 0.1184624
2 0.1147970 0.1177761 0.1177066 0.1180166
3 0.1157353 0.1147970 0.1177761 0.1177066
4 0.1153563 0.1157353 0.1147970 0.1177761
5 0.1148154 0.1153563 0.1157353 0.1147970
6 0.1147775 0.1148154 0.1153563 0.1157353
law_lag_12 law_lag_13 law_lag_14 law_lag_15
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 1 1 1
5 1 1 1 1
6 1 1 1 1
DriversKilled DriversKilled_lag_3 DriversKilled_lag_6 DriversKilled_lag_9
16 102 125 134 110
17 103 134 147 106
18 111 110 180 107
19 120 102 125 134
20 129 103 134 147
21 122 111 110 180
DriversKilled_lag_12 kms_lag_4 kms_lag_5 kms_lag_6 kms_lag_7 kms_lag_8
16 87 9267 9834 11372 12106 14055
17 119 9130 9267 9834 11372 12106
18 106 8933 9130 9267 9834 11372
19 110 11000 8933 9130 9267 9834
20 106 10733 11000 8933 9130 9267
21 107 12912 10733 11000 8933 9130
kms_lag_9 kms_lag_10 kms_lag_11 kms_lag_12 PetrolPrice_lag_6
16 13460 12391 11823 10955 0.1030264
17 14055 13460 12391 11823 0.1027301
18 12106 14055 13460 12391 0.1019972
19 11372 12106 14055 13460 0.1012746
20 9834 11372 12106 14055 0.1007040
21 9267 9834 11372 12106 0.1001396
PetrolPrice_lag_7 PetrolPrice_lag_8 PetrolPrice_lag_9 PetrolPrice_lag_10
16 0.1037740 0.1040764 0.1037740 0.1005812
17 0.1030264 0.1037740 0.1040764 0.1037740
18 0.1027301 0.1030264 0.1037740 0.1040764
19 0.1019972 0.1027301 0.1030264 0.1037740
20 0.1012746 0.1019972 0.1027301 0.1030264
21 0.1007040 0.1012746 0.1019972 0.1027301
PetrolPrice_lag_11 PetrolPrice_lag_12 PetrolPrice_lag_13 PetrolPrice_lag_14
16 0.1010197 0.1008733 0.1020625 0.1023630
17 0.1005812 0.1010197 0.1008733 0.1020625
18 0.1037740 0.1005812 0.1010197 0.1008733
19 0.1040764 0.1037740 0.1005812 0.1010197
20 0.1037740 0.1040764 0.1037740 0.1005812
21 0.1030264 0.1037740 0.1040764 0.1037740
PetrolPrice_lag_15 law_lag_8
16 0.1029718 0
17 0.1023630 0
18 0.1020625 0
19 0.1008733 0
20 0.1010197 0
21 0.1005812 0
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.