data_prep_func: Prepares data for modeling

Description Usage Arguments Value

View source: R/prepare_ml_data.R

Description

Prepares data for modeling

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
data_prep_func(
  data,
  outcome_var,
  negative_to_zero = FALSE,
  fix_gap_size = FALSE,
  max_gap_size,
  remove_one_obs = FALSE,
  trailing_zero = FALSE,
  transformation = "none",
  use_holidays = FALSE,
  holidays_to_use_1 = NULL,
  holidays_to_use_2 = NULL,
  use_seasonal_lag = TRUE,
  seasonal_frequency,
  horizon,
  clean = FALSE,
  drop_na_values = TRUE,
  use_holiday_to_clean = FALSE,
  holiday_for_clean = NULL,
  use_abc_category = FALSE,
  pacf_threshold = 0.2,
  no_fourier_terms = 5,
  fourier_k = 5,
  slidify_period = c(4, 8),
  use_own_fourier = FALSE,
  fourier_terms,
  intermittent = 0.3,
  recursive_data = FALSE,
  no_recursive_lag,
  xreg = NULL,
  fill_na_with_zero = TRUE,
  anomaly = FALSE
)

Arguments

data

Data frame with data on the right date format e.g. daily, weekly, monthly and with column named 'id'

outcome_var

Name of the outcome variable. The function will change the outcome variable name to 'outcome'

negative_to_zero

Recodes negative values as zero, defaults to TRUE

max_gap_size

The maximum length that the outcome can be zero. If the interval is larger than max_gap_size then only use data after the interval

trailing_zero

Extends all time series, back and forward, so they will be the same length. Defaults to FALSE

transformation

Should the series be transformed, e.g. log or log1p. Defaults to none

use_holidays

Should national holidays be included. As of now this has to be a dataframe supplied by the user to the holidays_to_use argument

holidays_to_use_1

Data frame of dummy holidays. Outcome of the create_holiday() function: fridagar_tbl

holidays_to_use_2

Data frame of holidays, one variable.

use_seasonal_lag

Should lag of outcome variable, equal to the seasonality, be used. Defaults to TRUE.

seasonal_frequency

The frequency of the data. E.g. 52 for weekly data

horizon

The forecast horizon

clean

Should the data be cleand for outliers. Defaults to FALSE

drop_na_values

When creating data_prepared_tbl, should NA's be dropped. Defaults to TRUE

use_holiday_to_clean

Uses fridagar_one_var from the create_holiday() function to revert series to original value if cleand

pacf_threshold

Threshold for where to cut the PACF to choose terms for the fourier calculation

no_fourier_terms

Number of fourier terms, defultas to 5

fourier_k

The fourier term order, defaults to 5

slidify_period

The window size, defaults to c(4, 8)

use_own_fourier

Should you use your own fourier terms? Defaults to FALSE

fourier_terms

The fourier terms to include.

intermittent

Intermittent threshold when to remove anomaly label. Defaults to 0.3

recursive_data

Should the data be prepared for a recursive forecasting. Defaults to FALSE.

no_recursive_lag

The number of lags to be.

xreg

External regressors to add

fill_na_with_zero

Used when drop_na = FALSE to fill missing values with zero instead of dropping them.

anomaly

Should anomaly detection variable added. Defaults to FALSE.

Value

List with data_prepared, future_data, train_data, splits and horizon


vidarsumo/sumots documentation built on June 29, 2021, 4:23 a.m.