knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

caret.ts: time series models for the "caret" package

Build Status CRAN_Status_Badge Coverage Status

caret.ts provides various functions for machine learning with time series data. While the "caret" package is common in various tasks related to machine learning; its naive version does yet ship dedicated time series models. This implementation thus extends the "caret" package and offers additional models, including ARMA or ARIMA. Additionally, it customizes the "train" function to accept time series data.

Overview

The most important functions in caret.ts are:

To see examples of these functions in use, check out the help pages or the examples in this README file.

Installation

Using the devtools package, you can easily install the latest development version of caret.ts with

install.packages("devtools")

# Option 1: download and install latest version from ‘GitHub’
devtools::install_github("sfeuerriegel/caret.ts")

# Option 2: install directly from bundled archive
# devtoos::install_local("caret.ts_0.1.0.tar.gz")

Notes:

Usage of models

This section shows the basic functionality of how to perform machine learning with time seris models inside caret. First, load the corresponding package caret.ts.

library(caret.ts)

The examples below how to insert the models inside the train() function from caret. Additionally, this package also implements an additional variant of the train() function that accepts time series objects (see below).

ARMA model

Auto-regressive moving-average (ARMA) models can be faciliated both with and without exogeneous variables. By using arma_model(p, q), one can construct an ARMA model of a fixed, pre-defined order. Alternatively, one can let the train() function pick the order that fits the training data best. For the latter purpose, use auto_arma_model() (with optional arguments for the maximum order).

Example without exogenous variables:

library(forecast)
data(WWWusage) # from package "forecast"
df <- data.frame(y = as.numeric(WWWusage))

arma <- train(y ~ 1, data = df, method = arma_model(1, 1), trControl = trainDirectFit())
summary(arma)

predict(arma, df)
RMSE(predict(arma, df), df)

Example with exogenous variables:

library(vars)
data(Canada)

arma <- train(x = Canada[, -2], y = Canada[, 2], 
              method = arma_model(2, 0), trControl = trainDirectFit())

summary(arma)
arimaorder(arma$finalModel) # order of best model

predict(arma, Canada[, -2]) # in-sample predictions
RMSE(predict(arma, Canada[, -2]), Canada[, 2]) # in-sample RMSE

Predictions and testing:

ARIMA model

Auto-regressive moving-average models with differneces (ARIMA) also support exogeneous variables if desired. By using arima_model(p, d, q), one can construct an ARIMA model of a fixed, pre-defined order:

Alternatively, one can let the train() function pick the order that fits the training data best. For the latter purpose, use auto_arima_model() (with optional arguments for the maximum order).

Example without exogenous variables:

library(forecast)

data(WWWusage) # from package "forecast"
df <- data.frame(y = as.numeric(WWWusage))

library(caret)

# ARIMA model of order (1, 1, 1)

arima1 <- train(y ~ 1, data = df, method = arima_model(1, 1, 1), trControl = trainDirectFit())

summary(arima1)

predict(arima1, df) # in-sample predictions
RMSE(predict(arima1, df), df) # in-sample RMSE

# ARIMA model of order (1, 2, 1)

arima2 <- train(y ~ 1, data = df, method = arima_model(1, 2, 1), trControl = trainDirectFit())

summary(arima2)

predict(arima2, df) # in-sample predictions
RMSE(predict(arima2, df), df) # in-sample RMSE

# Auto-tuned ARIMA model

auto_arima <- train(y ~ 1, data = df, method = auto_arima_model(), trControl = trainDirectFit())

summary(auto_arima)
arimaorder(auto_arima$finalModel) # order of best model

predict(auto_arima, df) # in-sample predictions
RMSE(predict(auto_arima, df), df) # in-sample RMSE

Example with exogenous variables:

library(vars)
data(Canada)

arima <- train(x = Canada[, -2], y = Canada[, 2], 
               method = auto_arima_model(), trControl = trainDirectFit())

summary(arima)
arimaorder(arima$finalModel) # order of best model

predict(arima, Canada[, -2]) # in-sample predictions
RMSE(predict(arima, Canada[, -2]), Canada[, 2]) # in-sample RMSE

ETS

ets_model() integrates a so-called exponential smoothing state space (ETS) model. Note that the ETS model does not contain any exogenous predictors; however, we need to supply some sample data to work with caret, but which is ignored.

data_train <- WWWusage[1:80]
data_test <- WWWusage[81:100]

lm <- train(data_train, method = "lm", trControl = trainDirectFit())
RMSE(predict(lm, data_test), data_test)

ets <- train(data_train, method = ets_model(), trControl = trainDirectFit())
summary(ets)
RMSE(predict(ets, data_test), data_test)

Evaluation

Alternative interface for training with time series objects

trainDirectFit()

This package automatically extends the train() function to successfully work with time series objects of class ts.

Example with uni-variate time series:

class(WWWusage)
str(WWWusage)

# Example with uni-variate time series and no predictors

arima <- train(WWWusage, method = auto_arima_model(), trControl = trainDirectFit())

summary(arima)
arimaorder(arima$finalModel) # order of best model

predict(arima, WWWusage) # in-sample RMSE
RMSE(predict(arima, WWWusage), WWWusage) # in-sample RMSE

Example for time series with exogenous predictors:

class(Canada)
str(Canada)

# Variant with explicit x and y

arima <- train(x = Canada[, -2], y = Canada[, 2], 
               method = auto_arima_model(), trControl = trainDirectFit())

summary(arima)
arimaorder(arima$finalModel) # order of best model

predict(arima, Canada[, -2]) # in-sample predictions
RMSE(predict(arima, Canada[, -2]), Canada[, 2]) # in-sample RMSE

# Variant with formula

arima <- train(form = prod ~ ., data = Canada, 
               method = auto_arima_model(), trControl = trainDirectFit())

summary(arima)
arimaorder(arima$finalModel) # order of best model

predict(arima, Canada) # in-sample predictions
RMSE(predict(arima, Canada), Canada[, "prod"]) # in-sample RMSE

Plotting of forecasts

library(ggplot2)

canada_train <- Canada[1:60, ]
canada_test <- Canada[61:84, ]

canada_arima <- train(form = prod ~ ., data = canada_train, 
                      method = auto_arima_model(), trControl = trainDirectFit())

canada_rf <- train(form = prod ~ ., data = canada_train, 
                      method = "rf")

pred_arima <- predict(canada_arima, canada_test)
pred_rf <- predict(canada_rf, canada_test)

df <- data.frame(n = 1:84, Canada, 
                 pred_arima = c(rep(NA, 60), pred_arima),
                 pred_rf = c(rep(NA, 60), pred_rf))

ggplot(df) +
  geom_line(aes(x = n, y = prod, color = "Actual")) +
  geom_line(aes(x = n, y = pred_arima, color = "Predicted (ARIMA)")) +
  geom_line(aes(x = n, y = pred_rf, color = "Predicted (RF)")) +
  geom_vline(xintercept = 60) +
  scale_color_manual("Time series", values = c("Actual" = "darkblue", "Predicted (ARIMA)" = "darkred", "Predicted (RF)" = "darkgray")) +
  theme_bw()

Variable importance

Variable importance metrics as provided by caret are supported accordingly.

Example:

absCoef <- varImp(arima, scale = FALSE) # variable importance (= absolute value of coefficient)
absCoef

plot(absCoef)

Time series cross-validation

TODO

License

caret.ts is released under the MIT License

Copyright (c) 2016 Stefan Feuerriegel



sfeuerriegel/caret.ts documentation built on May 29, 2019, 8:01 p.m.