Time Series Coercion Using timekit"

knitr::opts_chunk$set(
    # message = FALSE,
    # warning = FALSE,
    fig.width = 8, 
    fig.height = 4.5,
    fig.align = 'center',
    out.width='95%', 
    dpi = 200
)
library(tidyquant)
library(timekit)
library(forecast)
# devtools::load_all() # Travis CI fails on load_all()

Simplified and extensible time series coercion tools

The time series landscape in R is vast, deep, and complex causing many inconsistencies in data attributes and formats ultimately making it difficult to coerce between the different data structures. The zoo and xts packages solved a number of the issues in dealing with the various classes (ts, zoo, xts, irts, msts, and the list goes on...). However, because these packages deal in classes other than data frame, the issues with coercion between tbl and other time series object classes are still present.

The timekit package provides tools that solve the issues with coercion, maximizing attribute extensibility (the required data attributes are retained during the coercion to each of the primary time series classes). The following tools are available to coerce and retrieve key information:

This vignette includes a brief case study on coercion issues and then a detailed explanation of timekit function coercion between time-based tbl objects and several primary time series classes (xts, zoo, zooreg and ts).

Prerequisites

Before we get started, load the following packages.

library(tidyquant)
library(timekit)

Data

We'll use the ten-year treasury rate available from the FRED database with the code, "DGS10". We'll retrieve the data set using tq_get(get = "economic.data"). The return structure is a tibble (or "tidy" data frame), which is not conducive to many of the popular time series analysis packages including quantmod, TTR, forecast and many others.

ten_year_treasury_rate_tbl <- tq_get("DGS10", 
                                     get  = "economic.data", 
                                     from = "1997-01-01", 
                                     to   = "2016-12-31") %>%
    rename(pct = price) %>%
    mutate(pct = pct / 100)
ten_year_treasury_rate_tbl

For purposes of the Case Study, we'll change to a quarterly periodicity using tq_transmute() from the tidyquant package. Note that NA values are automatically removed from the data (message not shown).

ten_year_treasury_rate_tbl <- ten_year_treasury_rate_tbl %>%
    tq_transmute(pct, mutate_fun = to.period, period = "quarters")
ten_year_treasury_rate_tbl

Case Study: Coercion issues with ts()

The ts object class has roots in the stats package and many popular packages use this time series data structure including the popular forecast package. With that said, the ts data structure is the most difficult to coerce back and forth because by default it does not contain a time-based index. Rather it uses a regularized index computed using the start and frequency arguments. Coercion to ts is done using the ts() function from the stats library, which results in various problems.

Problems

First, only numeric columns get coerced. If the user forgets to add the [,"pct"] to drop the "date" column, ts() returns dates in numeric format which is not what the user wants.

# date column gets coerced to numeric
ts(ten_year_treasury_rate_tbl, start = 1997, freq = 4) %>%
    head()

The correct method is to call the specific column desired. However, this presents a new issue. The date index is lost, and a different "regularized" index is built using the start and frequency attributes.

ten_year_treasury_rate_ts_stats <- ts(ten_year_treasury_rate_tbl[,"pct"], 
                                      start = 1997, 
                                      freq  = 4)
ten_year_treasury_rate_ts_stats

We can see from the structure (using the str() function) that the regularized time series is present, but there is no date index retained.

# No date index attribute
str(ten_year_treasury_rate_ts_stats)

We can get the index using the index() function from the zoo package. The index retained is a regular sequence of numeric values. In many cases, the regularized values cannot be coerced back to the original time-base because the date and date time data contains significantly more information (i.e. year-month-day, hour-minute-second, and timezone attributes) and the data may not be on a regularized interval (frequency).

# Regularized numeric sequence
index(ten_year_treasury_rate_ts_stats)

Solution

The timekit package contains a new function, tk_ts(), that enables maintaining the original date index as an attribute. When we repeat the tbl to ts coercion process using the new function, tk_ts(), we can see a few differences.

First, only numeric columns get coerced, which prevents unintended consequences due to R coercion rules (e.g. dates getting unintentionally converted or characters causing the homogeneous data structure converting all numeric values to character). If a column is dropped, the user gets a warning.

# date automatically dropped and user is warned
ten_year_treasury_rate_ts_timekit <- tk_ts(ten_year_treasury_rate_tbl, 
                                         start = 1997, 
                                         freq  = 4)
ten_year_treasury_rate_ts_timekit

Second, the data returned has a few additional attributes. The most important of which is a numeric attribute, "index", which contains the original date information as a number. The ts() function will not preserve this index while tk_ts() will preserve the index in numeric form along with the time zone and class.

# More attributes including time index, time class, time zone
str(ten_year_treasury_rate_ts_timekit)

Advantages of coercion with tk_tbl()

Since we used the tk_ts() during coercion, we can extract the original index in date format using tk_index(timekit_idx = TRUE) (the default is timekit_idx = FALSE which returns the default regularized index).

# Can now retrieve the original date index
timekit_index <- tk_index(ten_year_treasury_rate_ts_timekit, timekit_idx = TRUE)
head(timekit_index)
class(timekit_index)

Next, the tk_tbl() function has an argument timekit_idx also which can be used to select which index to return. First, we show coercion using the default index. Notice that the index returned is "regularized" meaning its actually a numeric index rather than a time-based index.

# Coercion back to tibble using the default index (regularized)
ten_year_treasury_rate_ts_timekit %>%
    tk_tbl(index_rename = "date", timekit_idx = FALSE)

We can now get the original date index using the tk_tbl() argument timekit_idx = TRUE.

# Coercion back to tibble now using the timekit index (date / date-time)
ten_year_treasury_rate_tbl_timekit <- ten_year_treasury_rate_ts_timekit %>%
    tk_tbl(index_rename = "date", timekit_idx = TRUE)
ten_year_treasury_rate_tbl_timekit

We can see that in this case (and in most cases) you can get the same data frame you began with.

# Comparing the coerced tibble with the original tibble
identical(ten_year_treasury_rate_tbl_timekit, ten_year_treasury_rate_tbl)

Coercion Methods

Using the ten_year_treasury_rate_tbl, we'll go through the various coercion methods using tk_tbl, tk_xts, tk_zoo, tk_zooreg, and tk_ts.

From tbl

The starting point is the ten_year_treasury_rate_tbl. We will coerce this into xts, zoo, zooreg and ts classes.

# Start:
ten_year_treasury_rate_tbl

to xts

Use tk_xts(). By default "date" is used as the date index and the "date" column is dropped from the output. Only numeric columns are coerced to avoid unintentional coercion issues.

# End
ten_year_treasury_rate_xts <- tk_xts(ten_year_treasury_rate_tbl) 
head(ten_year_treasury_rate_xts)

Use the select argument to specify which columns to drop. Use the date_var argument to specify which column to use as the date index. Notice the message and warning are no longer present.

# End - Using `select` and `date_var` args
tk_xts(ten_year_treasury_rate_tbl, select = -date, date_var = date) %>%
    head()

Also, as an alternative, we can set silent = TRUE to bypass the warnings since the default dropping of the "date" column is what is desired. Notice no warnings or messages.

# End - Using `silent` to silence warnings
tk_xts(ten_year_treasury_rate_tbl, silent = TRUE) %>%
    head()

to zoo

Use tk_zoo(). Same as when coercing to xts, the non-numeric "date" column is automatically dropped and the index is automatically selected as the date column.

# End
ten_year_treasury_rate_zoo <- tk_zoo(ten_year_treasury_rate_tbl, silent = TRUE) 
head(ten_year_treasury_rate_zoo)

to zooreg

Use tk_zooreg(). Same as when coercing to xts, the non-numeric "date" column is automatically dropped. The regularized index is built from the function arguments start and freq.

# End
ten_year_treasury_rate_zooreg <- tk_zooreg(ten_year_treasury_rate_tbl, 
                                           start  = 1997, 
                                           freq   = 4,
                                           silent = TRUE) 
head(ten_year_treasury_rate_zooreg)

The original time-based index is retained and can be accessed using tk_index(timekit_idx = TRUE).

# Retrieve original time-based index
tk_index(ten_year_treasury_rate_zooreg, timekit_idx = TRUE) %>%
    str()

to ts

Use tk_ts(). The non-numeric "date" column is automatically dropped. The regularized index is built from the function arguments.

# End
ten_year_treasury_rate_ts <- tk_ts(ten_year_treasury_rate_tbl, 
                                   start  = 1997, 
                                   freq   = 4,
                                   silent = TRUE) 
ten_year_treasury_rate_ts

The original time-based index is retained and can be accessed using tk_index(timekit_idx = TRUE).

# Retrieve original time-based index
tk_index(ten_year_treasury_rate_ts, timekit_idx = TRUE) %>%
    str()

To tbl

Going back to tibble is just as easy using tk_tbl().

From xts

# Start
head(ten_year_treasury_rate_xts)

Notice no loss of data going back to tbl.

# End
tk_tbl(ten_year_treasury_rate_xts)

From zoo

# Start
head(ten_year_treasury_rate_zoo)

Notice no loss of data going back to tbl.

# End
tk_tbl(ten_year_treasury_rate_zoo)

From zooreg

# Start
head(ten_year_treasury_rate_zooreg)

Notice that the index is a regularized numeric sequence by default.

# End - with default regularized index
tk_tbl(ten_year_treasury_rate_zooreg)

With timekit_idx = TRUE the index is the original date sequence. The result is the original tbl that we started with!

# End - with timekit index that is the same as original time-based index
tk_tbl(ten_year_treasury_rate_zooreg, timekit_idx = TRUE)

From ts

# Start
ten_year_treasury_rate_ts

Notice that the index is a regularized numeric sequence by default.

# End - with default regularized index
tk_tbl(ten_year_treasury_rate_ts)

With timekit_idx = TRUE the index is the original date sequence. The result is the original tbl that we started with!

# End - with timekit index 
tk_tbl(ten_year_treasury_rate_ts, timekit_idx = TRUE)

Additional Concepts

This section covers additional concepts that the user may find useful when working with time series.

Testing if an object has a timekit index

The function has_timekit_idx() can be used to test whether toggling the timekit_idx argument in the tk_index() and tk_tbl() functions will have an effect on the output. Here are several examples using the ten year treasury data used in the case study:

Testing ts()

There's no "timekit index" if the ts() function is used. The solution is to use tk_ts() to coerce the to ts.

# Data coerced with stats::ts() has no timekit index
has_timekit_idx(ten_year_treasury_rate_ts_stats)

If we try to toggle timekit_idx = TRUE when retrieving the index with tk_index(), we get a warning and the default regularized time series is returned.

tk_index(ten_year_treasury_rate_ts_stats, timekit_idx = TRUE)

If we try to toggle timekit_idx = TRUE during coercion to tbl using tk_tbl(), we get a warning and the default regularized time series is returned.

tk_tbl(ten_year_treasury_rate_ts_stats, timekit_idx = TRUE)

Testing tk_ts()

The tk_ts() function returns an object with the "timekit index" attribute.

# Data coerced with tk_ts() has timekit index
has_timekit_idx(ten_year_treasury_rate_ts_timekit)

If we toggle timekit_idx = TRUE when retrieving the index with tk_index(), we get the index of dates rather than the regularized time series.

tk_index(ten_year_treasury_rate_ts_timekit, timekit_idx = TRUE)

If we toggle timekit_idx = TRUE during coercion to tbl using tk_tbl(), we get the index of dates rather than the regularized index in the returned tbl.

tk_tbl(ten_year_treasury_rate_ts_timekit, timekit_idx = TRUE)

Testing other data types

The timekit_idx argument will only have an effect on objects that use regularized time series. Therefore, has_timekit_idx() returns FALSE for other object types (e.g. tbl, xts, zoo) since toggling the argument has no effect on these classes.

has_timekit_idx(ten_year_treasury_rate_xts)

Toggling the timekit_idx argument has no effect on the output. Output with timekit_idx = TRUE is the same as with timekit_idx = FALSE.

tk_index(ten_year_treasury_rate_xts, timekit_idx = TRUE)
tk_index(ten_year_treasury_rate_xts, timekit_idx = FALSE)

Coercing ts to xts and zoo

It's common to need to coerce data stored as data frame or another structure with a time-base to ts to perform some analysis. It's also common to need to coerce it from the regularized structure to a time-based structure such as xts or zoo to perform further analysis within your workflow. Traditionally coercing a ts class object to an xts or zoo class object was difficult or impossible since the ts object does not maintain a time-based index and the xts and zoo objects require the order.by argument to specify a time-based index. The zoo package contains some regularizing functions (yearmon and yearqtr) that can be converted to dates, but there is no easy method to coerce ts objects on frequencies such as daily until now. The general process is as follows:

  1. Begin with an object with a time-based index in date or date-time format. Typically this would be a data frame (tbl) or xts object.
  2. Coerce to ts using the tk_ts() function setting the start and frequency parameters for regularization. This generates a regularized ts object as normal, but using the tk_ts() function also maintains the time-based "timekit index".
  3. Coerce to xts or zoo using tk_xts() or tk_zoo() respectively.

Here's a quick example. Our starting point is a tibble (tbl) but it could be another time-based object such as xts or zoo.

# Start with a date or date-time indexed data frame
data_tbl <- tibble::tibble(
    date = seq.Date(as.Date("2016-01-01"), by = 1, length.out = 5),
    x    = cumsum(11:15) * rnorm(1))
data_tbl

Coerce to ts class using the tk_ts() function. Note that the non-numeric column "date" is being dropped, and the silent = TRUE hides the message.

# Coerce to ts 
data_ts <- tk_ts(data_tbl, start = 2016, freq = 365, silent = TRUE)
data_ts

Coercion to xts normally requires a date or datetime index to be passed to the order.by argument. However, when coercing ts objects created with tk_ts(), the tk_xts function automatically uses the "timekit index" if present.

# Inspect timekit index
has_timekit_idx(data_ts)

If the "timekit index" is present, the user can simply pass the ts object to the coercion function (tk_xts()), which will automatically use the "timekit index" to order by.

# No need to specify order.by arg
data_xts <- tk_xts(data_ts)
data_xts

We can see that the xts structure is maintained.

str(data_xts)

The same process can be used to coerce from ts to zoo class using tk_zoo.

# No need to specify order.by arg
data_zoo <- tk_zoo(data_ts)
data_zoo

We can see that the zoo structure is maintained.

str(data_zoo)

Note that tbl requires the timekit_idx = TRUE argument to specify the use of the non-regularized index.

tk_tbl(data_ts, timekit_idx = TRUE)

Working with yearmon and yearqtr index

The zoo package has the yearmon and yearqtr classes for working with regularized monthly and quarterly data, respectively. The "timekit index" tracks the format during coercion. Here's and example with yearqtr.

yearqtr_tbl <- ten_year_treasury_rate_tbl %>%
    mutate(date = as.yearqtr(date))
yearqtr_tbl

We can coerce to xts and the yearqtr class is intact.

yearqtr_xts <- tk_xts(yearqtr_tbl)
yearqtr_xts %>%
    head()

We can coerce to ts and, although the "timekit index" is hidden, the yearqtr class is intact.

yearqtr_ts <- tk_ts(yearqtr_xts, start = 1997, freq = 4)
yearqtr_ts %>%
    head()

Coercing from ts to tbl using timekit_idx = TRUE shows that the original index was maintained through each of the coercion steps.

yearqtr_ts %>%
    tk_tbl(timekit_idx = TRUE)

Getting the index of other time-based objects (e.g. models)

It can be important to retrieve the index from models and other objects that use an underlying time series data set. We'll go through an example retrieving the time index from an ARIMA model using tk_index().

library(forecast)
fit_arima <- ten_year_treasury_rate_ts %>%
    auto.arima()

We can get the time index from the ARIMA model.

tk_index(fit_arima)

We can also get the original index from the ARIMA model be setting timekit_idx = TRUE.

tk_index(fit_arima, timekit_idx = TRUE)


Try the timekit package in your browser

Any scripts or data that you put into this service are public.

timekit documentation built on July 4, 2017, 9:45 a.m.