# sarima: Fit extended SARIMA models In sarima: Simulation and Prediction with Seasonal ARIMA Models

## Description

Fit extended SARIMA models, which can include lagged exogeneous variables, general unit root non-stationary factors, multiple periodicities, and multiplicative terms in the SARIMA specification. The models are specified with a flexible formula syntax and contain as special cases many models with specialised names, such as ARMAX and reg-ARIMA.

## Usage

 1 sarima(model, data = NULL, ss.method = "sarima") 

## Arguments

 model a model formula specifying the model. data a list or data frame, usually can be omitted. ss.method state space engine to use, defaults to "sarima". (Note: this argument will probably be renamed.)

## Details

sarima fits extended SARIMA models, which can include exogeneous variables, general unit root non-stationary factors and multiplicative terms in the SARIMA specification.

Let \{Y_t\} be a time series and f(t) and g(t) be functions of time and/or (possibly lagged) exogeneous variables.

An extended pure SARIMA model for Y_t can be written with the help of the backward shift operator as

U(B)Φ(B)Y_t = Θ(B)\varepsilon_t,

where {eps_t} is white noise, and U(z), Φ(z), and Θ(z) are polynomials, such that all roots of U(z) are on the unit circle, while the roots of Φ(z) and Θ(z) are outside the unit circle. If unit roots are missing, ie U(z)=1, the model is stationary with mean zero.

A reg-SARIMA or X-SARIMA model can be defined as a regression with SARIMA residuals:

Y_t = f(t) + Y^c_t

U(B)Φ(B)Y^c_t = Θ(B)\varepsilon_t,

where Yc_t = Y_t - f(t) is the centred Y_t. This can be written equivalently as a single equation:

U(B)Φ(B)(Y_t - f(t)) = Θ(B)\varepsilon_t.

The regression function f(t) can depend on time and/or (possibly lagged) exogeneous variables. We call it centering function. If Yc_t is stationary with mean zero, f(t) is the mean of Y_t. If f(t) is constant, say mu, Y_t is stationary with mean mu. Note that the two-equation form above shows that in that case mu is the intercept in the first equation, so it is perfectly reasonable to refer to it also as intercept but to avoid confusion we reserve the term intercept for g(t) below.

If the SARIMA part is stationary, then EY_t = f(t), so f(t) can be interpreted as trend. In this case the above specification is often referred to as mean corrected form of the model.

An alternative way to specify the regression part is to add the regression function, say \{g(t)\}, to the right-hand side of the SARIMA equation:

U(B)Φ(B)Y_t = g(t) + Θ(B)\varepsilon_t.

In the stationary case this is the classical ARMAX specification. This can be written in two-stage form in various ways, eg

U(B)Φ(B)Y_t = (1 - Θ(B))\varepsilon_t + u_t,

u_t = g(t) + \varepsilon_t .

So, in a sense, g(t) is a trend associated with the residuals from the SARIMA modelling. We refer to this form as intercept form of the model (as opposed to the mean-corrected form discussed previously).

In general, if there are no exogeneous variables the mean-corrected model is equivalent to some intercept model, which gives some justification of the terminology, as well. If there are exogeneous variables equivalence may be achievable but at the expense of introducing more lags in the model, whish is not desirable in general.

Some examples of equivalence. Let Y be a stationary SARIMA process (U(z)=1) with mean mu. Then the mean-corrected form of the SARIMA model is

Φ(B)(Y_t - μ) = Θ(B)\varepsilon_t,

while the intercept form is

Φ(B)Y_t = c + Θ(B)\varepsilon_t,

where c = Φ(B)μ. So, in this case the mean-corrected model X-SARIMA model with f(t) = mu is equivalent to the intercept model with g(t) = Phi(B)mu.

As another example, with f(t) = bt, the mean-corrected model is (1-B)(Y_t - bt) = \varepsilon_t. Expanding the left-hand side we obtain the intercept form (1-B)Y_t = b + \varepsilon_t, which demonstrates that Y_t is a random walk with drift g(t) = b.

Model specification

Argument model specifies the model with a syntax similar to other model fitting functions in R. A formula can be given for each of the components discussed above as y ~ f | SARIMA | g, where f, SARIMA and g are model formulas giving the specifications for the centering function f, the SARIMA specification, and the intercept function g. In normal use only one of f or g will be different from zero. f should always be given (use 0 to specify that it is identical to zero), but g can be omitted altogether. Sometimes we refer to the terms specified by f and g by xreg and regx, respectively.

Model formulas for trends and exogeneous regressions

The formulas for the centering and intercept (ie f and g) use the same syntax as in linear models with some additional functions for trigonometric trends, polynomial trends and lagged variables.

Here are the available specialised terms:

.p(d)

Orthogonal polynomials over 1:length(y) of degree d (starting from degree 1, no constant).

t

Stands for 1:length(y). Note that powers need to be protected by I(), e.g. y ~ 1 + .t + I(.t^2).

.cs(s, k)

cos/sin pair for the k-th harmonic of 2pi/s. Use vector k to specify several harmonics.

.B(x, lags)

Include lagged terms of x, B^{lags}(x[t]) = x[t - lags]. lags can be a vector. If x is a matrix, the specified lags are taken from each column.

Model formulas for SARIMA models

A flexible syntax is provided for the specification of the SARIMA part of the model. It is formed using a number of primitives for stationary and unit root components, which have non-seasonal and seasonal variants. Arbitrary number of multiplicative factors and multiple seasonalities can be specified.

The SARIMA part of the model can contain any of the following terms. They can be repeated as needed. The first argument for all seasonal operators is the number of seasons.

ar(p)

autoregression term of order p

ma(q)

moving average term of order q

sar(s,p)

seasonal autoregression term (s seasons, order p)

sma(s,q)

seasonal moving average term (s seasons, order q)

i(d)

(1-B)^d

s(seas)

summation operator, (1 + B + ... + B^(seas-1))

u(x)

quadratic unit root term, corresponding to a complex pair on the unit circle. If x is real, it specifies the argument of one of the roots as a fraction of 2pi. If z is complex, it is the root itself.

The real roots of modulus one (1 and -1) should be specified using i(1) and s(2), which correspond to 1-B and 1+B, respectively.

su(s, h)

quadratic unit root terms corresponding to seasonal differencing factors. h specifies the desired harmonic which should be one of 1,2, ..., [s/2]. Several harmonics can be specified by setting h to a vector.

ss(s, p)

seasonal summation operator, (1 + B^s + ... + B^(p(s-1)))

Terms with parameters can contain additional arguments specifying initial values, fixed parameters, and transforms. For ar, ma, sar, sma, values of the coefficients can be specified by an unnamed argument after the parameters given in the descriptions above. In estimation these values will be taken as initial values for optimisation. By default, all coefficients are taken to be non-fixed.

Argument fixed can be used to fix some of them. If it is a logical vector it should be of length one or have the same length as the coefficients. If fixed is of length and TRUE, all coefficients are fixed. If FALSE, all are non-fixed. Otherwise, the TRUE/FALSE values in fixed determine the fixedness of the corresponding coefficients.

fixed can also be a vector of positive integer numbers specifying the indices of fixed coefficients, the rest are non-fixed.

Sometimes it may be easier to declare more (e.g. all) coefficients as fixed and then ‘unfix’ selectively. Argument nonfixed can be used to mark some coefficients as non-fixed after they have been declared fixed. Its syntax is the same as for fixed.

TODO: streamline "atanh.tr"

## Value

an object from S3 class Sarima

(Note: the format of the object is still under development and may change; use accessor functions, such as coef(), where provided.)

## Note

Currently the implementation of the intercept form (ie the third part of the model formula) is incomplete.

## Author(s)

Georgi N. Boshnakov

arima
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 ## AirPassengers example ## fit the classic airline model using arima() ap.arima <- arima(log(AirPassengers), order = c(0,1,1), seasonal = c(0,1,1)) ## samemodel using twoequivalent ways to specify it ap.baseA <- sarima(log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(1) + si(12,1), ss.method = "base") ap.baseB <- sarima(log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(2) + s(12), ss.method = "base") ap.baseA summary(ap.baseA) ap.baseB summary(ap.baseB) ## as above, but drop 1-B from the model: ap2.arima <- arima(log(AirPassengers), order = c(0,0,1), seasonal = c(0,1,1)) ap2.baseA <- sarima(log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + si(12,1), ss.method = "base") ap2.baseB <- sarima(log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(1) + s(12), ss.method = "base") ## for illustration, here the non-stationary part is ## (1-B)^2(1+B+...+B^5) = (1-B)(1-B^6) ## ( compare to (1-B)(1-B^{12}) = (1-B)(1-B^6)(1+B^6) ) ap3.base <- sarima(log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(2) + s(6), ss.method = "base") ## further unit roots, equivalent specifications for the airline model tmp.su <- sarima(log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(2) + s(2) + su(12,1:5), ss.method = "base") tmp.su$interna$delta_poly prod(tmp.su$interna$delta_poly) zapsmall(coef(prod(tmp.su$interna$delta_poly))) tmp.su tmp.u <- sarima(log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(2) + s(2) + u((1:5)/12), ss.method = "base") tmp.u