In order to create a modeling dataset with feature lags that are temporally correct, the entry
create_lagged_df, needs evenly-spaced time series with no
gaps in data collection.
fill_gaps() can help here.
This function takes a
data.frame with (a) dates, (b) the outcome being forecasted, and, optionally,
(c) dynamic features that change through time, (d) group columns for multiple time series modeling,
and (e) static or non-dynamic features for multiple time series modeling and returns a
with rows evenly spaced in time. Specifically, this function adds rows to the input dataset
while filling in (a) dates, (b) grouping information, and (c) static features. The (a) outcome and (b)
dynamic features will be
NA for any missing time periods; these
NA values can be left
as-is, user-imputed, or removed from modeling in the user-supplied modeling wrapper function for
A data.frame or object coercible to a data.frame with, minimally, dates and the outcome being forecasted.
The column index–an integer–of the date index. This column should have class 'Date' or 'POSIXt'.
Date/time frequency. A string taking the same input as
Optional. A character vector of column names that identify the unique time series (i.e., groups/hierarchies) when multiple time series are present.
Optional. For grouped time series only. A character vector of column names that identify features that do not change through time. These columns are expected to be used as model features but are not lagged (e.g., a ZIP code column). The most recent values for each static feature for each group are used to fill in the resulting missing data in static features when new rows are added to the dataset.
An object of class 'data.frame': The returned data.frame has the same number of columns and column order but
with additional rows to account for gaps in data collection. For grouped data, any new rows added to the returned data.frame will appear
between the minimum–or oldest–date for that group and the maximum–or most recent–date across all groups. If the user-supplied
forecasting algorithm(s) cannot handle missing outcome values or missing dynamic features, these should either be
imputed prior to
create_lagged_df() or filtered out in the user-supplied modeling function for
The output of
fill_gaps() is passed into
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# NOAA buoy dataset with gaps in data collection data("data_buoy_gaps", package = "forecastML") data_buoy_no_gaps <- fill_gaps(data_buoy_gaps, date_col = 1, frequency = '1 day', groups = 'buoy_id', static_features = c('lat', 'lon')) # The returned data.frame has the same number of columns but the time-series # are now evenly spaced at 1 day apart. Additionally, the unchanging grouping # columns and static features columns have been filled in for the newly created dataset rows. dim(data_buoy_gaps) dim(data_buoy_no_gaps) # Running create_lagged_df() is the next step in the forecastML forecasting # process. If there are long gaps in data collection, like in this buoy dataset, # and the user-supplied modeling algorithm cannot handle missing outcomes data, # the best option is to filter these rows out in the user-supplied modeling function # for train_model()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.