In FlukeAndFeather/tsrf: Time Series Random Forest Modeling

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

tsrf

tsrf provides time series classification tools based on the Time Series Random Forest algorithm by Deng et al. (2013).

Installation

You can install the development version of tsrf from GitHub with:

# install.packages("devtools")
devtools::install_github("FlukeAndFeather/tsrf")

Example

This example demonstrates how tsrf extracts features, fits a model, and makes predictions. First, generate 10 time series from two different underlying classes

library(tsrf)
set.seed(1733)
tsnum <- 20
tslbl <- factor(rep(c("A", "B"), each = tsnum))
tslen <- 250
Aval <- replicate(tsnum, rnorm(tslen))
Bval <- replicate(tsnum, c(rnorm(tslen / 2), cumsum(rnorm(tslen / 2, 0.5, 0.2))))
tsdat <- tibble::as_tibble(data.frame(
  id = rep(1:(2 * tsnum), each = tslen),
  val = c(as.numeric(Aval), as.numeric(Bval))
))
tibble::as_tibble(tsdat)

Time series from the first class (red) have no trend but the second half of time series from the second class trend upwards.

plot(seq(tslen), tsdat$val[tsdat$id == 1],
     type = "l", col = "#FF000088", ylim = range(tsdat$val),
     xlab = "index", ylab = "value")
for (i in 2:(tsnum * 2)) {
  lines(seq(tslen), tsdat$val[tsdat$id == i],
        col = if (i <= tsnum) "#FF000088" else "#0000FF88")
}

The features extracted from each time series are simple summary statistics (mean, sd, slope) of randomly sampled intervals. Points closer to the middle of the time series will end up in more intervals. extract_features() converts long format time series data to wide format, with one row per time series and a column for each feature.

tsints <- sample_intervals(tslen)
tsfeat <- extract_features(tsdat, "id", tsints)

prevalence <- sapply(
  seq(tslen),
  function(i) sum(i >= tsints[, 1] & i <= tsints[, 2]) / nrow(tsints)
)
plot(seq(tslen), prevalence, type = "l", xlab = "index")

tibble::as_tibble(tsfeat)

After sampling intervals and extracting features, fit the model and make predictions.

# Split data
train_index <- sample(seq(nrow(tsfeat)), 0.9 * nrow(tsfeat))
train_feat <- tsfeat[train_index, ]
test_feat <- tsfeat[-train_index, ]
train_lbl <- tslbl[train_index]
test_lbl <- tslbl[-train_index]

# Train model
tsrf <- train_tsrf(train_feat, train_lbl, "id")

# Test predictions
pred_feat <- predict(tsrf, test_feat)
caret::confusionMatrix(pred_feat, test_lbl)