The goal of bigtime
is to sparsely estimate large time series models
such as the Vector AutoRegressive (VAR) Model, the Vector AutoRegressive
with Exogenous Variables (VARX) Model, and the Vector AutoRegressive
Moving Average (VARMA) Model. The univariate cases are also supported.
You can install bigtime from github a follows:
# install.packages("devtools")
devtools::install_github("ineswilms/bigtime")
We will use the time series contained in the example
data set. The
first ten columns in our dataset are used as endogenous time series in
the VAR and VARMA models, and the last five columns are used as
exogenous time series in the VARX model. Note that we remove the last
observation from our dataset as we will use this one to illustrate how
to evaluate prediction performance. We start by making a plot of our
data.
library(bigtime)
data(example)
Y <- example[-nrow(example), 1:10] # endogenous time series
Ytest <- example[nrow(example), 1:10]
X <- example[-nrow(example), 11:15] # exogenous time series
par(mfrow=c(5,2), mar=c(2.5, 2.5, 1, 1))
for(i in 1:ncol(Y)){
plot(Y[,i], type="l", ylab="", xlab="Time", main=paste("Endogenous Time series", i))
}
par(mfrow=c(3,2), mar=c(2.5, 2.5, 1, 1))
for(i in 1:ncol(X)){
plot(X[,i], type="l", ylab="", xlab="Time", main=paste("Exogenous Time series", i))
}
To make estimation of large VAR models feasible, one could start by
using an L1-penalty (lasso penalty) on the autoregressive coefficients.
To this end, set the VARpen
argument in the sparseVAR
function equal
to L1. A time-series cross-validation procedure is used to select the
sparsity parameters. The default is to use a cross-validation score
based on one-step ahead predictions but you can change the default
forecast horizon under the argument h
. The function lagmatrix
returns the lagmatrix of the estimated autoregressive coefficients. If
entry (i, j) = x, this means that the sparse estimator indicates
the effect of time series j on time series i to last for x
periods.
VARL1 <- sparseVAR(Y=Y, VARpen="L1") # default forecast horizon is h=1
par(mfrow=c(1, 1))
LhatL1 <- lagmatrix(fit=VARL1, model="VAR", returnplot=T)
The lag matrix is typically sparse as it contains soms empty (i.e., zero) cells. However, VAR models estimated with a standard L1-penalty are typically not easily interpretable as they select many high lag order coefficients (i.e., large values in the lagmatrix).
To circumvent this problem, we advise using a lag-based hierarchically
sparse estimation procedure, which boils down to using the default
option HLag for the VARpen
argument. This estimation procedures
encourages low maximum lag orders, often results in sparser lagmatrices,
and hence more interpretable models:
VARHLag <- sparseVAR(Y=Y) # VARpen="HLag" is the default
par(mfrow=c(1, 1))
LhatHLag <- lagmatrix(fit=VARHLag, model="VAR", returnplot=T)
Often practitioners are interested in incorparating the impact of
unmodeled exogenous variables (X) into the VAR model. To do this, you
can use the sparseVARX
function which has an argument X
where you
can enter the data matrix of exogenous time series. When applying the
lagmatrix
function to an estimated sparse VARX model, the lag matrices
of both the endogenous and exogenous autoregressive coefficients are
returned.
VARXfit <- sparseVARX(Y=Y, X=X)
par(mfrow=c(1, 2))
LhatVARX <- lagmatrix(fit=VARXfit, model="VARX", returnplot=T)
VARMA models generalized VAR models and often allow for more
parsimonious representations of the data generating process. To estimate
a VARMA model to a multivariate time series data set, use the function
sparseVARMA
. Now lag matrices are obtained for the autoregressive (AR)
coefficients and the moving average (MAs) coefficients.
VARMAfit <- sparseVARMA(Y=Y)
par(mfrow=c(1, 2))
LhatVARMA <- lagmatrix(fit=VARMAfit, model="VARMA", returnplot=T)
To obtain forecasts from the estimated models, you can use the
directforecast
function. The default forecast horizon (argument h
)
is set to one such that one-step ahead forecasts are obtained, but you
can specify your desired forecast horizon. Finally, we compare the
forecast accuracy of the different models by comparing their forecasts
to the actual time series values. In this example, the VARMA model has
the best forecast performance (i.e., lowest mean squared prediction
error). This is no surprise as the multivariate time series Y was
generated from a VARMA model.
VARf <- directforecast(VARHLag, model="VAR") # default is h=1
VARXf <- directforecast(VARXfit, model="VARX")
VARMAf <- directforecast(VARMAfit, model="VARMA")
mean((VARf-Ytest)^2)
#> [1] 2.448348
mean((VARXf-Ytest)^2)
#> [1] 2.491267
mean((VARMAf-Ytest)^2) # lowest=best
#> [1] 2.114305
The functions sparseVAR
, sparseVARX
, sparseVARMA
can also be used
for the univariate setting where the response time series Y is
univariate. Below we illustrate the usefulness of the sparse estimation
procedure as automatic lag selection procedures.
We start by generating a time series of length n = 50 from a
stationary AR model and by plotting it. The sparseVAR
function can
also be used in the univariate case as it allows the argument Y
to be
a vector. The lagmatrix
function gives the selected autoregressive
order of the sparse AR model. The true order is one.
n <- 50
y <- rep(NA, n)
y[1] <- rnorm(1, sd=0.1)
for(i in 2:n){
y[i] <- 0.5*y[i-1] + rnorm(1, sd=0.1)
}
par(mfrow=c(1,1))
plot(y, type="l", xlab="Time", ylab="")
ARfit <- sparseVAR(Y=y)
lagmatrix(fit=ARfit, model="VAR")
#> $LPhi
#> [,1]
#> [1,] 1
We start by generating a time series of length n = 50 from a
stationary ARX model and by plotting it. The sparseVARX
function can
also be used in the univariate case as it allows the arguments Y
and
X
to be vectors. The lagmatrix
function gives the selected
endogenous (under LPhi
) and exogenous autoregressive (under LB
)
orders of the sparse ARX model. The true orders are one.
n <- 50
y <- x <- rep(NA, n)
x[1] <- rnorm(1, sd=0.1)
y[1] <- rnorm(1, sd=0.1)
for(i in 2:n){
x[i] <- 0.8*x[i-1] + rnorm(1, sd=0.1)
y[i] <- 0.5*y[i-1] + 0.2*x[i-1] + rnorm(1, sd=0.1)
}
par(mfrow=c(1,1))
plot(y, type="l", xlab="Time", ylab="")
ARXfit <- sparseVARX(Y=y, X=x)
lagmatrix(fit=ARXfit, model="VARX")
#> $LPhi
#> [,1]
#> [1,] 1
#>
#> $LB
#> [,1]
#> [1,] 10
We start by generating a time series of length n = 50 from a
stationary ARMA model and by plotting it. The sparseVARMA
function can
also be used in the univariate case as it allows the argument Y
to be
a vector. The lagmatrix
function gives the selected autoregressive
(under LPhi
) and moving average (under LTheta
) orders of the sparse
ARMA model. The true orders are one.
n <- 50
y <- u <- rep(NA, n)
y[1] <- rnorm(1, sd=0.1)
u[1] <- rnorm(1, sd=0.1)
for(i in 2:n){
u[i] <- rnorm(1, sd=0.1)
y[i] <- 0.5*y[i-1] + 0.2*u[i-1] + u[i]
}
par(mfrow=c(1,1))
plot(y, type="l", xlab="Time", ylab="")
ARMAfit <- sparseVARMA(Y=y)
lagmatrix(fit=ARMAfit, model="VARMA")
#> $LPhi
#> [,1]
#> [1,] 1
#>
#> $LTheta
#> [,1]
#> [1,] 0
Nicholson William B., Bien Jacob and Matteson David S. (2020), “High-dimensional forecasting via interpretable vector autoregression”, Journal of Machine Learning Research, 21(166), 1-52.
Wilms Ines, Sumanta Basu, Bien Jacob and Matteson David S. (2017), “Sparse Identification and Estimation of High-Dimensional Vector AutoRegressive Moving Averages”, arXiv:1707.09208.
Wilms Ines, Sumanta Basu, Bien Jacob and Matteson David S. (2017), “Interpretable Vector AutoRegressions with Exogenous Time Series”, arXiv.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.