knitr::opts_chunk$set( # message = FALSE, # warning = FALSE, fig.width = 8, fig.height = 4.5, fig.align = 'center', out.width='95%', dpi = 100 ) # devtools::load_all() # Travis CI fails on load_all()
This vignette covers time series class conversion to and from the many time series classes in R including the general data frame (or tibble) and the various time series classes (xts
, zoo
, and ts
).
The time series landscape in R is vast, deep, and complex causing many inconsistencies in data attributes and formats ultimately making it difficult to coerce between the different data structures. The zoo
and xts
packages solved a number of the issues in dealing with the various classes (ts
, zoo
, xts
, irts
, msts
, and the list goes on...). However, because these packages deal in classes other than data frame, the issues with conversion between tbl
and other time series object classes are still present.
The timetk
package provides tools that solve the issues with conversion, maximizing attribute extensibility (the required data attributes are retained during the conversion to each of the primary time series classes). The following tools are available to coerce and retrieve key information:
Conversion functions: tk_tbl
, tk_ts
, tk_xts
, tk_zoo
, and tk_zooreg
. These functions coerce time-based tibbles tbl
to and from each of the main time-series data types xts
, zoo
, zooreg
, ts
, maintaining the time-based index.
Index function: tk_index
returns the index. When the argument, timetk_idx = TRUE
, A time-based index (non-regularized index) of forecast
objects, models, and ts
objects is returned if present. Refer to tk_ts()
to learn about non-regularized index persistence during the conversion process.
This vignette includes a brief case study on conversion issues and then a detailed explanation of timetk
function conversion between time-based tbl
objects and several primary time series classes (xts
, zoo
, zooreg
and ts
).
Before we get started, load the following packages.
library(dplyr) library(timetk)
We'll use the "Q10" dataset - The first ID from a sample a quarterly datasets (see m4_quarterly
) from the M4 Competition. The return structure is a tibble
, which is not conducive to many of the popular time series analysis packages including quantmod
, TTR
, forecast
and many others.
q10_quarterly <- m4_quarterly %>% dplyr::filter(id == "Q10") q10_quarterly
The ts
object class has roots in the stats
package and many popular packages use this time series data structure including the popular forecast
package. With that said, the ts
data structure is the most difficult to coerce back and forth because by default it does not contain a time-based index. Rather it uses a regularized index computed using the start
and frequency
arguments. Conversion to ts
is done using the ts()
function from the stats
library, which results in various problems.
First, only numeric columns get coerced. If the user forgets to add the [,"pct"]
to drop the "date" column, ts()
returns dates in numeric format which is not what the user wants.
# date column gets coerced to numeric ts(q10_quarterly, start = c(2000, 1), freq = 4) %>% head()
The correct method is to call the specific column desired. However, this presents a new issue. The date index is lost, and a different "regularized" index is built using the start
and frequency
attributes.
q10_quarterly_ts <- ts(q10_quarterly$value, start = c(2000, 1), freq = 4) q10_quarterly_ts
We can see from the structure (using the str()
function) that the regularized time series is present, but there is no date index retained.
# No date index attribute str(q10_quarterly_ts)
We can get the index using the index()
function from the zoo
package. The index retained is a regular sequence of numeric values. In many cases, the regularized values cannot be coerced back to the original time-base because the date and date time data contains significantly more information (i.e. year-month-day, hour-minute-second, and timezone attributes) and the data may not be on a regularized interval (frequency).
# Regularized numeric sequence zoo::index(q10_quarterly_ts)
The timetk
package contains a new function, tk_ts()
, that enables maintaining the original date index as an attribute. When we repeat the tbl
to ts
conversion process using the new function, tk_ts()
, we can see a few differences.
First, only numeric columns get coerced, which prevents unintended consequences due to R conversion rules (e.g. dates getting unintentionally converted or characters causing the homogeneous data structure converting all numeric values to character). If a column is dropped, the user gets a warning.
# date automatically dropped and user is warned q10_quarterly_ts_timetk <- tk_ts(q10_quarterly, start = 2000, freq = 4) q10_quarterly_ts_timetk
Second, the data returned has a few additional attributes. The most important of which is a numeric attribute, "index", which contains the original date information as a number. The ts()
function will not preserve this index while tk_ts()
will preserve the index in numeric form along with the time zone and class.
# More attributes including time index, time class, time zone str(q10_quarterly_ts_timetk)
Since we used the tk_ts()
during conversion, we can extract the original index in date format using tk_index(timetk_idx = TRUE)
(the default is timetk_idx = FALSE
which returns the default regularized index).
# Can now retrieve the original date index timetk_index <- q10_quarterly_ts_timetk %>% tk_index(timetk_idx = TRUE) head(timetk_index) class(timetk_index)
Next, the tk_tbl()
function has an argument timetk_idx
also which can be used to select which index to return. First, we show conversion using the default index. Notice that the index returned is "regularized" meaning its actually a numeric index rather than a time-based index.
# Conversion back to tibble using the default index (regularized) q10_quarterly_ts_timetk %>% tk_tbl(index_rename = "date", timetk_idx = FALSE)
We can now get the original date index using the tk_tbl()
argument timetk_idx = TRUE
.
# Conversion back to tibble now using the timetk index (date / date-time) q10_quarterly_timetk <- q10_quarterly_ts_timetk %>% tk_tbl(timetk_idx = TRUE) %>% rename(date = index) q10_quarterly_timetk
We can see that in this case (and in most cases) you can get the same data frame you began with.
# Comparing the coerced tibble with the original tibble identical(q10_quarterly_timetk, q10_quarterly %>% select(-id))
Using the q10_quarterly
, we'll go through the various conversion methods using tk_tbl
, tk_xts
, tk_zoo
, tk_zooreg
, and tk_ts
.
The starting point is the q10_quarterly
. We will coerce this into xts
, zoo
, zooreg
and ts
classes.
# Start:
q10_quarterly
Use tk_xts()
. By default "date" is used as the date index and the "date" column is dropped from the output. Only numeric columns are coerced to avoid unintentional conversion issues.
# End q10_quarterly_xts <- tk_xts(q10_quarterly) head(q10_quarterly_xts)
Use the select
argument to specify which columns to drop. Use the date_var
argument to specify which column to use as the date index. Notice the message and warning are no longer present.
# End - Using `select` and `date_var` args tk_xts(q10_quarterly, select = -(id:date), date_var = date) %>% head()
Also, as an alternative, we can set silent = TRUE
to bypass the warnings since the default dropping of the "date" column is what is desired. Notice no warnings or messages.
# End - Using `silent` to silence warnings tk_xts(q10_quarterly, silent = TRUE) %>% head()
Use tk_zoo()
. Same as when coercing to xts, the non-numeric "date" column is automatically dropped and the index is automatically selected as the date column.
# End q10_quarterly_zoo <- tk_zoo(q10_quarterly, silent = TRUE) head(q10_quarterly_zoo)
Use tk_zooreg()
. Same as when coercing to xts, the non-numeric "date" column is automatically dropped. The regularized index is built from the function arguments start
and freq
.
# End q10_quarterly_zooreg <- tk_zooreg(q10_quarterly, start = 2000, freq = 4, silent = TRUE) head(q10_quarterly_zooreg)
The original time-based index is retained and can be accessed using tk_index(timetk_idx = TRUE)
.
# Retrieve original time-based index tk_index(q10_quarterly_zooreg, timetk_idx = TRUE) %>% str()
Use tk_ts()
. The non-numeric "date" column is automatically dropped. The regularized index is built from the function arguments.
# End q10_quarterly_ts <- tk_ts(q10_quarterly, start = 2000, freq = 4, silent = TRUE) q10_quarterly_ts
The original time-based index is retained and can be accessed using tk_index(timetk_idx = TRUE)
.
# Retrieve original time-based index tk_index(q10_quarterly_ts, timetk_idx = TRUE) %>% str()
Going back to tibble is just as easy using tk_tbl()
.
# Start head(q10_quarterly_xts)
Notice no loss of data going back to tbl
.
# End tk_tbl(q10_quarterly_xts)
# Start head(q10_quarterly_zoo)
Notice no loss of data going back to tbl
.
# End tk_tbl(q10_quarterly_zoo)
# Start head(q10_quarterly_zooreg)
Notice that the index is a regularized numeric sequence by default.
# End - with default regularized index tk_tbl(q10_quarterly_zooreg)
With timetk_idx = TRUE
the index is the original date sequence. The result is the original tbl
that we started with!
# End - with timetk index that is the same as original time-based index tk_tbl(q10_quarterly_zooreg, timetk_idx = TRUE)
# Start
q10_quarterly_ts
Notice that the index is a regularized numeric sequence by default.
# End - with default regularized index tk_tbl(q10_quarterly_ts)
With timetk_idx = TRUE
the index is the original date sequence. The result is the original tbl
that we started with!
# End - with timetk index tk_tbl(q10_quarterly_ts, timetk_idx = TRUE)
The function has_timetk_idx()
can be used to test whether toggling the timetk_idx
argument in the tk_index()
and tk_tbl()
functions will have an effect on the output. Here are several examples using the ten year treasury data used in the case study:
The tk_ts()
function returns an object with the "timetk index" attribute.
# Data coerced with tk_ts() has timetk index has_timetk_idx(q10_quarterly_ts)
If we toggle timetk_idx = TRUE
when retrieving the index with tk_index()
, we get the index of dates rather than the regularized time series.
tk_index(q10_quarterly_ts, timetk_idx = TRUE)
If we toggle timetk_idx = TRUE
during conversion to tbl
using tk_tbl()
, we get the index of dates rather than the regularized index in the returned tbl
.
tk_tbl(q10_quarterly_ts, timetk_idx = TRUE)
The timetk_idx
argument will only have an effect on objects that use regularized time series. Therefore, has_timetk_idx()
returns FALSE
for other object types (e.g. tbl
, xts
, zoo
) since toggling the argument has no effect on these classes.
has_timetk_idx(q10_quarterly_xts)
Toggling the timetk_idx
argument has no effect on the output. Output with timetk_idx = TRUE
is the same as with timetk_idx = FALSE
.
tk_index(q10_quarterly_xts, timetk_idx = TRUE)
The zoo
package has the yearmon
and yearqtr
classes for working with regularized monthly and quarterly data, respectively. The "timetk index" tracks the format during conversion. Here's and example with yearqtr
.
yearqtr_tbl <- q10_quarterly %>% mutate(date = zoo::as.yearqtr(date)) yearqtr_tbl
We can coerce to xts
and the yearqtr
class is intact.
yearqtr_xts <- tk_xts(yearqtr_tbl) yearqtr_xts %>% head()
We can coerce to ts
and, although the "timetk index" is hidden, the yearqtr
class is intact.
yearqtr_ts <- tk_ts(yearqtr_xts, start = 1997, freq = 4) yearqtr_ts %>% head()
Coercing from ts
to tbl
using timetk_idx = TRUE
shows that the original index was maintained through each of the conversion steps.
yearqtr_ts %>% tk_tbl(timetk_idx = TRUE)
My Talk on High-Performance Time Series Forecasting
Time series is changing. Businesses now need 10,000+ time series forecasts every day.
High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a "High-Performance Time Series Forecasting System" (HPTSF System).
I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:
Modeltime
- 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)GluonTS
(Competition Winners)Unlock the High-Performance Time Series Forecasting Course
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.