Description Usage Arguments Value Attributes Methods and related functions Examples
Create a list of datasets with lagged, grouped, dynamic, and static features to (a) train forecasting models for specified forecast horizons and (b) forecast into the future with a trained ML model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | create_lagged_df(
data,
type = c("train", "forecast"),
method = c("direct", "multi_output"),
outcome_col = 1,
horizons,
lookback = NULL,
lookback_control = NULL,
dates = NULL,
frequency = NULL,
dynamic_features = NULL,
groups = NULL,
static_features = NULL,
predict_future = NULL,
use_future = FALSE,
keep_rows = FALSE
)
|
data |
A data.frame with the (a) target to be forecasted and (b) features/predictors. An optional date column can be given in the
|
type |
The type of dataset to return–(a) model training or (b) forecast prediction. The default is |
method |
The type of modeling dataset to create. |
outcome_col |
The column index–an integer–of the target to be forecasted. If |
horizons |
A numeric vector of one or more forecast horizons, h, measured in dataset rows.
If |
lookback |
A numeric vector giving the lags–in dataset rows–for creating the lagged features. All non-grouping,
non-static, and non-dynamic features in the input dataset, |
lookback_control |
A list of numeric vectors, specifying potentially unique lags for each feature. The length
of the list should equal |
dates |
A vector or 1-column data.frame of dates/times with class 'Date' or 'POSIXt'. The length
of |
frequency |
Date/time frequency. Required if |
dynamic_features |
A character vector of column names that identify features that change through time but which are not lagged (e.g., weekday or year).
If |
groups |
A character vector of column names that identify the groups/hierarchies when multiple time series are present. These columns are used as model features but
are not lagged. Note that combining feature lags with grouped time series will result in |
static_features |
For grouped time series only. A character vector of column names that identify features that do not change through time.
These columns are not lagged. If |
predict_future |
When |
use_future |
Boolean. If |
keep_rows |
Boolean. For non-grouped time series, keep the |
An S3 object of class 'lagged_df' or 'grouped_lagged_df': A list of data.frames with new columns for the lagged/non-lagged features.
For method = "direct"
, the length of the returned list is equal to the number of forecast horizons and is in the order of
horizons supplied to the horizons
argument. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_h
where 'h' gives the forecast horizon.
For method = "multi_output"
, the length of the returned list is 1. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_1_3_5
where "1_3_5" represents the forecast horizons passed in horizons
.
The contents of the returned data.frames are as follows:
A data.frame with the outcome and lagged/dynamic features.
A data.frame with the outcome and unlagged grouping columns followed by lagged, dynamic, and static features.
(1) An 'index' column giving the row index or date of the
forecast periods (e.g., a 100 row non-date-based training dataset would start with an index of 101). (2) A 'horizon' column
that indicates the forecast period from 1:max(horizons)
. (3) Lagged features identical to the
'train', non-grouped dataset.
(1) An 'index' column giving the date of the
forecast periods. The first forecast date for each group is the maximum date from the dates
argument
+ 1 * frequency
which is the user-supplied date/time frequency.(2) A 'horizon' column that indicates
the forecast period from 1:max(horizons)
. (3) Lagged, static, and dynamic features identical to the 'train', grouped dataset.
names
: The horizon-specific datasets that can be accessed with my_lagged_df$horizon_h
.
type
: Training, train
, or forecasting, forecast
, dataset(s).
method
: direct
or multi_output
.
horizons
: Forecast horizons measured in dataset rows.
outcome_col
: The column index of the target being forecasted.
outcome_cols
: If method = multi_output
, the column indices of the multiple outputs in the transformed dataset.
outcome_name
: The name of the target being forecasted.
outcome_names
: If method = multi_output
, the column names of the multiple outputs in the transformed dataset.
The names take the form "outcome_name_h" where 'h' is a horizon passed in horizons
.
predictor_names
: The predictor or feature names from the input dataset.
row_indices
: The row.names()
of the output dataset. For non-grouped datasets, the first
lookback
+ 1 rows are removed from the beginning of the dataset to remove NA
values in the lagged features.
date_indices
: If dates
are given, the vector of dates
.
frequency
: If dates
are given, the date/time frequency.
data_start
: min(row_indices)
or min(date_indices)
.
data_stop
: max(row_indices)
or max(date_indices)
.
groups
: If groups
are given, a vector of group names.
class
: grouped_lagged_df, lagged_df, list
The output of create_lagged_df()
is passed into
create_windows
and has the following generic S3 methods
summary
plot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # Sampled Seatbelts data from the R package datasets.
data("data_seatbelts", package = "forecastML")
#------------------------------------------------------------------------------
# Example 1 - Training data for 2 horizon-specific models w/ common lags per predictor.
horizons <- c(1, 12)
lookback <- 1:15
data <- data_seatbelts
data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
horizons = horizons, lookback = lookback)
head(data_train[[length(horizons)]])
# Example 1 - Forecasting dataset
# The last 'nrow(data_seatbelts) - horizon' rows are automatically used from data_seatbelts.
data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1,
horizons = horizons, lookback = lookback)
head(data_forecast[[length(horizons)]])
#------------------------------------------------------------------------------
# Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor.
horizons <- 3
lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8))
data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
horizons = horizons, lookback_control = lookback)
head(data_train[[length(horizons)]])
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.