tk_tsfeatures: Time series feature matrix (Tidy)
In timetk: A Tool Kit for Working with Time Series

View source: R/diagnostiscs-tsfeatures.R

tk_tsfeatures

R Documentation

Time series feature matrix (Tidy)

Description

tk_tsfeatures() is a tidyverse compliant wrapper for tsfeatures::tsfeatures(). The function computes a matrix of time series features that describes the various time series. It's designed for groupwise analysis using dplyr groups.

Usage

tk_tsfeatures(
  .data,
  .date_var,
  .value,
  .period = "auto",
  .features = c("frequency", "stl_features", "entropy", "acf_features"),
  .scale = TRUE,
  .trim = FALSE,
  .trim_amount = 0.1,
  .parallel = FALSE,
  .na_action = na.pass,
  .prefix = "ts_",
  .silent = TRUE,
  ...
)

Arguments

`.data`	A `tibble` or `data.frame` with a time-based column
`.date_var`	A column containing either date or date-time values
`.value`	A column containing numeric values
`.period`	The periodicity (frequency) of the time series data. Values can be provided as follows: "auto" (default) Calculates using `tk_get_frequency()`. "2 weeks": Would calculate the median number of observations in a 2-week window. 7 (numeric): Would interpret the `ts` frequency as 7 observations per cycle (common for weekly data)
`.features`	Passed to `features` in the underlying `tsfeatures()` function. A vector of function names that represent a feature aggregation function. Examples: Use one of the function names from `tsfeatures` R package e.g.("lumpiness", "stl_features"). Use a function name (e.g. "mean" or "median") Create your own function and provide the function name
`.scale`	If `TRUE`, time series are scaled to mean 0 and sd 1 before features are computed.
`.trim`	If `TRUE`, time series are trimmed by trim_amount before features are computed. Values larger than trim_amount in absolute value are set to `NA`.
`.trim_amount`	Default level of trimming if trim==TRUE. Default: 0.1.
`.parallel`	If TRUE, multiple cores (or multiple sessions) will be used. This only speeds things up when there are a large number of time series. When `.parallel = TRUE`, the `multiprocess = future::multisession`. This can be adjusted by setting `multiprocess` parameter. See the `tsfeatures::tsfeatures()` function for mor details.
`.na_action`	A function to handle missing values. Use na.interp to estimate missing values.
`.prefix`	A prefix to prefix the feature columns. Default: `"ts_"`.
`.silent`	Whether or not to show messages and warnings.
`...`	Other arguments get passed to the feature functions.

Details

The timetk::tk_tsfeatures() function implements the tsfeatures package for computing aggregated feature matrix for time series that is useful in many types of analysis such as clustering time series.

The timetk version ports the tsfeatures::tsfeatures() function to a tidyverse-compliant format that uses a tidy data frame containing grouping columns (optional), a date column, and a value column. Other columns are ignored.

It then becomes easy to summarize each time series by group-wise application of .features, which are simply functions that evaluate a time series and return single aggregated value. (Example: "mean" would return the mean of the time series (note that values are scaled to mean 1 and sd 0 first))

Function Internals:

Internally, the time series are converted to ts class using tk_ts(.period) where the period is the frequency of the time series. Values can be provided for .period, which will be used prior to convertion to ts class.

The function then leverages tsfeatures::tsfeatures() to compute the feature matrix of summarized feature values.

Value

A tibble or data.frame with aggregated features that describe each time series.

References

Rob Hyndman, Yanfei Kang, Pablo Montero-Manso, Thiyanga Talagala, Earo Wang, Yangzhuoran Yang, Mitchell O'Hara-Wild: tsfeatures R package

Examples

library(dplyr)

walmart_sales_weekly %>%
    group_by(id) %>%
    tk_tsfeatures(
      .date_var = Date,
      .value    = Weekly_Sales,
      .period   = 52,
      .features = c("frequency", "stl_features", "entropy", "acf_features", "mean"),
      .scale    = TRUE,
      .prefix   = "ts_"
    )

timetk documentation built on Nov. 2, 2023, 6:18 p.m.