Description Usage Arguments Value
View source: R/prepare_ml_data.R
Prepares data for modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | data_prep_func(
data,
outcome_var,
negative_to_zero = FALSE,
fix_gap_size = FALSE,
max_gap_size,
remove_one_obs = FALSE,
trailing_zero = FALSE,
transformation = "none",
use_holidays = FALSE,
holidays_to_use_1 = NULL,
holidays_to_use_2 = NULL,
use_seasonal_lag = TRUE,
seasonal_frequency,
horizon,
clean = FALSE,
drop_na_values = TRUE,
use_holiday_to_clean = FALSE,
holiday_for_clean = NULL,
use_abc_category = FALSE,
pacf_threshold = 0.2,
no_fourier_terms = 5,
fourier_k = 5,
slidify_period = c(4, 8),
use_own_fourier = FALSE,
fourier_terms,
intermittent = 0.3,
recursive_data = FALSE,
no_recursive_lag,
xreg = NULL,
fill_na_with_zero = TRUE,
anomaly = FALSE
)
|
data |
Data frame with data on the right date format e.g. daily, weekly, monthly and with column named 'id' |
outcome_var |
Name of the outcome variable. The function will change the outcome variable name to 'outcome' |
negative_to_zero |
Recodes negative values as zero, defaults to TRUE |
max_gap_size |
The maximum length that the outcome can be zero. If the interval is larger than max_gap_size then only use data after the interval |
trailing_zero |
Extends all time series, back and forward, so they will be the same length. Defaults to FALSE |
transformation |
Should the series be transformed, e.g. log or log1p. Defaults to none |
use_holidays |
Should national holidays be included. As of now this has to be a dataframe supplied by the user to the holidays_to_use argument |
holidays_to_use_1 |
Data frame of dummy holidays. Outcome of the create_holiday() function: fridagar_tbl |
holidays_to_use_2 |
Data frame of holidays, one variable. |
use_seasonal_lag |
Should lag of outcome variable, equal to the seasonality, be used. Defaults to TRUE. |
seasonal_frequency |
The frequency of the data. E.g. 52 for weekly data |
horizon |
The forecast horizon |
clean |
Should the data be cleand for outliers. Defaults to FALSE |
drop_na_values |
When creating data_prepared_tbl, should NA's be dropped. Defaults to TRUE |
use_holiday_to_clean |
Uses fridagar_one_var from the create_holiday() function to revert series to original value if cleand |
pacf_threshold |
Threshold for where to cut the PACF to choose terms for the fourier calculation |
no_fourier_terms |
Number of fourier terms, defultas to 5 |
fourier_k |
The fourier term order, defaults to 5 |
slidify_period |
The window size, defaults to c(4, 8) |
use_own_fourier |
Should you use your own fourier terms? Defaults to FALSE |
fourier_terms |
The fourier terms to include. |
intermittent |
Intermittent threshold when to remove anomaly label. Defaults to 0.3 |
recursive_data |
Should the data be prepared for a recursive forecasting. Defaults to FALSE. |
no_recursive_lag |
The number of lags to be. |
xreg |
External regressors to add |
fill_na_with_zero |
Used when drop_na = FALSE to fill missing values with zero instead of dropping them. |
anomaly |
Should anomaly detection variable added. Defaults to FALSE. |
List with data_prepared, future_data, train_data, splits and horizon
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.