View source: R/recipes-step_ts_clean.R
step_ts_clean | R Documentation |
step_ts_clean
creates a specification of a recipe
step that will clean outliers and impute time series data.
step_ts_clean(
recipe,
...,
period = 1,
lambda = "auto",
role = NA,
trained = FALSE,
lambdas_trained = NULL,
skip = FALSE,
id = rand_id("ts_clean")
)
## S3 method for class 'step_ts_clean'
tidy(x, ...)
recipe |
A |
... |
One or more selector functions to choose which
variables are affected by the step. See |
period |
A seasonal period to use during the transformation. If |
lambda |
A box cox transformation parameter. If set to |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
lambdas_trained |
A named numeric vector of lambdas. This is |
skip |
A logical. Should the step be skipped when the recipe
is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
The step_ts_clean()
function is designed specifically to handle time series
using seasonal outlier detection methods implemented in the Forecast R Package.
Cleaning Outliers
#' Outliers are replaced with missing values using the following methods:
Non-Seasonal (period = 1
): Uses stats::supsmu()
Seasonal (period > 1
): Uses forecast::mstl()
with robust = TRUE
(robust STL decomposition)
for seasonal series.
Imputation using Linear Interpolation
Three circumstances cause strictly linear interpolation:
Period is 1: With period = 1
, a seasonality cannot be interpreted and therefore linear is used.
Number of Non-Missing Values is less than 2-Periods: Insufficient values exist to detect seasonality.
Number of Total Values is less than 3-Periods: Insufficient values exist to detect seasonality.
Seasonal Imputation using Linear Interpolation
For seasonal series with period > 1
, a robust Seasonal Trend Loess (STL) decomposition is first computed.
Then a linear interpolation is applied to the seasonally adjusted data, and
the seasonal component is added back.
Box Cox Transformation
In many circumstances, a Box Cox transformation can help. Especially if the series is multiplicative
meaning the variance grows exponentially. A Box Cox transformation can be automated by setting lambda = "auto"
or can be specified by setting lambda = numeric value
.
An updated version of recipe
with the new step
added to the sequence of existing steps (if any). For the
tidy
method, a tibble with columns terms
(the
selectors or variables selected) and value
(the
lambda estimate).
Time Series Analysis:
Engineered Features: step_timeseries_signature()
, step_holiday_signature()
, step_fourier()
Diffs & Lags step_diff()
, recipes::step_lag()
Smoothing: step_slidify()
, step_smooth()
Variance Reduction: step_box_cox()
Imputation: step_ts_impute()
, step_ts_clean()
Padding: step_ts_pad()
library(dplyr)
library(tidyr)
library(recipes)
# Get missing values
FANG_wide <- FANG %>%
select(symbol, date, adjusted) %>%
pivot_wider(names_from = symbol, values_from = adjusted) %>%
pad_by_time()
FANG_wide
# Apply Imputation
recipe_box_cox <- recipe(~ ., data = FANG_wide) %>%
step_ts_clean(FB, AMZN, NFLX, GOOG, period = 252) %>%
prep()
recipe_box_cox %>% bake(FANG_wide)
# Lambda parameter used during imputation process
recipe_box_cox %>% tidy(1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.