knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
caret.ts provides various functions for machine learning with time series data. While the "caret" package is common in various tasks related to machine learning; its naive version does yet ship dedicated time series models. This implementation thus extends the "caret" package and offers additional models, including ARMA or ARIMA. Additionally, it customizes the "train" function to accept time series data.
The most important functions in caret.ts are:
train()
inside caret now works with time series object (i.e. those of class ts
).
arma_model()
and arima_model()
construct a time series model of a pre-defined order to be used inside train()
.
auto_arma_model()
and auto_arima_model()
find the order that fits the training data best.
ets_model()
implements an ETS model, optionally with auto-search for the best model specification.
To see examples of these functions in use, check out the help pages or the examples in this README file.
Using the devtools package, you can easily install the latest development version of caret.ts with
install.packages("devtools") # Option 1: download and install latest version from ‘GitHub’ devtools::install_github("sfeuerriegel/caret.ts") # Option 2: install directly from bundled archive # devtoos::install_local("caret.ts_0.1.0.tar.gz")
Notes:
In the case of option 2, you have to specify the path either to the directory of caret.ts or to the bundled archive caret.ts_0.1.0.tar.gz
A CRAN version has not yet been released.
This section shows the basic functionality of how to perform machine learning with time seris models inside caret. First, load the corresponding package caret.ts.
library(caret.ts)
The examples below how to insert the models inside the train()
function from caret. Additionally, this package also implements an additional variant of the train()
function that accepts time series objects (see below).
Auto-regressive moving-average (ARMA) models can be faciliated both with and without exogeneous variables. By using arma_model(p, q)
, one can construct an ARMA model of a fixed, pre-defined order. Alternatively, one can let the train()
function pick the order that fits the training data best. For the latter purpose, use auto_arma_model()
(with optional arguments for the maximum order).
Example without exogenous variables:
library(forecast) data(WWWusage) # from package "forecast" df <- data.frame(y = as.numeric(WWWusage)) arma <- train(y ~ 1, data = df, method = arma_model(1, 1), trControl = trainDirectFit()) summary(arma) predict(arma, df) RMSE(predict(arma, df), df)
Example with exogenous variables:
library(vars) data(Canada) arma <- train(x = Canada[, -2], y = Canada[, 2], method = arma_model(2, 0), trControl = trainDirectFit()) summary(arma) arimaorder(arma$finalModel) # order of best model predict(arma, Canada[, -2]) # in-sample predictions RMSE(predict(arma, Canada[, -2]), Canada[, 2]) # in-sample RMSE
Predictions and testing:
Auto-regressive moving-average models with differneces (ARIMA) also support exogeneous variables if desired. By using arima_model(p, d, q)
, one can construct an ARIMA model of a fixed, pre-defined order:
p
: Order of auto-regressive (AR) terms
d
: Degree of differencing
q
: Order of moving-average (MA) terms
Alternatively, one can let the train()
function pick the order that fits the training data best. For the latter purpose, use auto_arima_model()
(with optional arguments for the maximum order).
Example without exogenous variables:
library(forecast) data(WWWusage) # from package "forecast" df <- data.frame(y = as.numeric(WWWusage)) library(caret) # ARIMA model of order (1, 1, 1) arima1 <- train(y ~ 1, data = df, method = arima_model(1, 1, 1), trControl = trainDirectFit()) summary(arima1) predict(arima1, df) # in-sample predictions RMSE(predict(arima1, df), df) # in-sample RMSE # ARIMA model of order (1, 2, 1) arima2 <- train(y ~ 1, data = df, method = arima_model(1, 2, 1), trControl = trainDirectFit()) summary(arima2) predict(arima2, df) # in-sample predictions RMSE(predict(arima2, df), df) # in-sample RMSE # Auto-tuned ARIMA model auto_arima <- train(y ~ 1, data = df, method = auto_arima_model(), trControl = trainDirectFit()) summary(auto_arima) arimaorder(auto_arima$finalModel) # order of best model predict(auto_arima, df) # in-sample predictions RMSE(predict(auto_arima, df), df) # in-sample RMSE
Example with exogenous variables:
library(vars) data(Canada) arima <- train(x = Canada[, -2], y = Canada[, 2], method = auto_arima_model(), trControl = trainDirectFit()) summary(arima) arimaorder(arima$finalModel) # order of best model predict(arima, Canada[, -2]) # in-sample predictions RMSE(predict(arima, Canada[, -2]), Canada[, 2]) # in-sample RMSE
ets_model()
integrates a so-called exponential smoothing state space (ETS) model. Note that the ETS model does not contain any exogenous predictors; however, we need to supply some sample data to work with caret, but which is ignored.
data_train <- WWWusage[1:80] data_test <- WWWusage[81:100] lm <- train(data_train, method = "lm", trControl = trainDirectFit()) RMSE(predict(lm, data_test), data_test) ets <- train(data_train, method = ets_model(), trControl = trainDirectFit()) summary(ets) RMSE(predict(ets, data_test), data_test)
trainDirectFit()
This package automatically extends the train()
function to successfully work with time series objects of class ts
.
Example with uni-variate time series:
class(WWWusage) str(WWWusage) # Example with uni-variate time series and no predictors arima <- train(WWWusage, method = auto_arima_model(), trControl = trainDirectFit()) summary(arima) arimaorder(arima$finalModel) # order of best model predict(arima, WWWusage) # in-sample RMSE RMSE(predict(arima, WWWusage), WWWusage) # in-sample RMSE
Example for time series with exogenous predictors:
class(Canada) str(Canada) # Variant with explicit x and y arima <- train(x = Canada[, -2], y = Canada[, 2], method = auto_arima_model(), trControl = trainDirectFit()) summary(arima) arimaorder(arima$finalModel) # order of best model predict(arima, Canada[, -2]) # in-sample predictions RMSE(predict(arima, Canada[, -2]), Canada[, 2]) # in-sample RMSE # Variant with formula arima <- train(form = prod ~ ., data = Canada, method = auto_arima_model(), trControl = trainDirectFit()) summary(arima) arimaorder(arima$finalModel) # order of best model predict(arima, Canada) # in-sample predictions RMSE(predict(arima, Canada), Canada[, "prod"]) # in-sample RMSE
library(ggplot2) canada_train <- Canada[1:60, ] canada_test <- Canada[61:84, ] canada_arima <- train(form = prod ~ ., data = canada_train, method = auto_arima_model(), trControl = trainDirectFit()) canada_rf <- train(form = prod ~ ., data = canada_train, method = "rf") pred_arima <- predict(canada_arima, canada_test) pred_rf <- predict(canada_rf, canada_test) df <- data.frame(n = 1:84, Canada, pred_arima = c(rep(NA, 60), pred_arima), pred_rf = c(rep(NA, 60), pred_rf)) ggplot(df) + geom_line(aes(x = n, y = prod, color = "Actual")) + geom_line(aes(x = n, y = pred_arima, color = "Predicted (ARIMA)")) + geom_line(aes(x = n, y = pred_rf, color = "Predicted (RF)")) + geom_vline(xintercept = 60) + scale_color_manual("Time series", values = c("Actual" = "darkblue", "Predicted (ARIMA)" = "darkred", "Predicted (RF)" = "darkgray")) + theme_bw()
Variable importance metrics as provided by caret are supported accordingly.
Example:
absCoef <- varImp(arima, scale = FALSE) # variable importance (= absolute value of coefficient) absCoef plot(absCoef)
TODO
caret.ts is released under the MIT License
Copyright (c) 2016 Stefan Feuerriegel
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.