Fitting ARIMA models with statespacer"
In statespacer: State Space Modelling in 'R'

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

A lot of time series practitioners resort to the well-known Box-Jenkins methods, such as ARIMA and SARIMA, when modelling time series. Those methods are easily incorporated into the State Space framework. This provides to be beneficial, as missing observations are easily dealt with in the State Space framework. Moreover, no observations need to be discarded due to the differencing and introduced lagged variables. The loglikelihood is calculated in an exact manner! In this document, we show you how to estimate ARIMA and SARIMA models using statespacer. We reproduce some estimation results as in @box2015time.

ARIMA modelling of the yearly sunspot data

To showcase the estimation of ARIMA models, we make use of the sunspot.year data, which contains yearly numbers of sunspots from 1700 to 1988. See ?sunspot.year for details. We only use the data from 1770 to 1869, to stay in line with @box2015time. We estimate an $\text{ARIMA}(3, ~ 0, ~ 0)$ with deterministic level (or constant if you prefer) as follows:

# Load statespacer
library(statespacer)

# Load the dataset
library(datasets)
Data <- matrix(window(sunspot.year, start = 1770, end = 1869))

# Estimate the ARIMA model
fit <- statespacer(y = Data,
                   H_format = matrix(0),
                   local_level_ind = TRUE,
                   arima_list = list(c(3, 0, 0)),
                   format_level = matrix(0),
                   initial = c(0.5*log(var(Data)), 0, 0, 0),
                   verbose = TRUE,
                   standard_errors = TRUE)

Note that we eliminate the observation error by setting its variance to 0, although it's perfectly fine to include observation errors along with ARIMA models, as long as you watch out for identification issues of course. For details about specifying proper initial values, please see vignette("dictionary", "statespacer").

We obtain the following estimates:

# Coefficients of the ARMA component
arma_coeff <- rbind(
   fit$system_matrices$AR$ARIMA1,
   fit$standard_errors$AR$ARIMA1
)
arma_coeff <- cbind(
   arma_coeff,
   c(fit$smoothed$level[1],
     sqrt(fit$system_matrices$Z_padded$level %*%
          fit$smoothed$V[,,1] %*%
          t(fit$system_matrices$Z_padded$level))
   )
)
rownames(arma_coeff) <- c("coefficient", "std_error")
colnames(arma_coeff) <- c("ar1", "ar2", "ar3", "intercept")
arma_coeff

goodness_fit <- rbind(
   fit$system_matrices$Q$ARIMA1,
   fit$diagnostics$loglik,
   fit$diagnostics$AIC
)
rownames(goodness_fit) <- c("Variance", "Loglikelihood", "AIC")
goodness_fit

We see that the results are fairly similar to the results as obtained by @box2015time. Differences may occur due to the different estimation procedures. We don't have to eliminate observations, so we use the full information available at hand, in contrast to traditional estimation procedures. Note that not much has to be done to estimate VARIMA models. In fact, you only need to specify a dependent variable y that has more than one column! It's also straightforward to add explanatory variables, by making use of the addvar_list option, see vignette("seatbelt", "statespacer") for an example of adding explanatory variables.

SARIMA modelling of the airline data

To showcase the estimation of SARIMA models, we make use of the classic AirPassengers data, which contains monthly totals of international airline passengers from 1949 to 1960. See ?AirPassengers for details. We estimate a $\text{SARIMA}(0, ~ 1, ~ 1){1} ~ \times ~ (0, ~ 1, ~ 1){12}$. Note that in the multivariate case, there is a subtle difference between $\text{SARIMA}(0, ~ 1, ~ 1){1} ~ \times ~ (0, ~ 1, ~ 1){12}$ and $\text{SARIMA}(0, ~ 1, ~ 1){12} ~ \times ~ (0, ~ 1, ~ 1){1}$ as matrix multiplication is not commutative.

We proceed as follows:

# Load the dataset
Data <- matrix(log(AirPassengers))

# The SARIMA specification, must be a list containing lists!
sarima_list <- list(list(s = c(12, 1), ar = c(0, 0), i = c(1, 1), ma = c(1, 1)))

# Fit the SARIMA model
fit <- statespacer(y = Data,
                   H_format = matrix(0),
                   sarima_list = sarima_list,
                   initial = c(0.5*log(var(diff(Data))), 0, 0),
                   verbose = TRUE)

We obtain the following estimates:

# Coefficients of the ARMA component
arma_coeff <- rbind(
   c(fit$system_matrices$SMA$SARIMA1$S1, fit$system_matrices$SMA$SARIMA1$S12),
   c(fit$standard_errors$SMA$SARIMA1$S1, fit$standard_errors$SMA$SARIMA1$S12)
)

rownames(arma_coeff) <- c("coefficient", "std_error")
colnames(arma_coeff) <- c("ma1 s = 1", "ma1 s = 12")
arma_coeff

goodness_fit <- rbind(
   fit$system_matrices$Q$SARIMA1,
   fit$diagnostics$loglik,
   fit$diagnostics$AIC
)
rownames(goodness_fit) <- c("Variance", "Loglikelihood", "AIC")
goodness_fit