License: CC BY-SA 4.0
External packages:
options(digits=5) suppressPackageStartupMessages(library(rutilstimflutre)) ## library(MASS) suppressPackageStartupMessages(library(Matrix)) suppressPackageStartupMessages(library(sparseMVN)) suppressPackageStartupMessages(library(nlme)) suppressPackageStartupMessages(library(INLA)) suppressPackageStartupMessages(library(sp)) suppressPackageStartupMessages(library(gstat))
This R chunk is used to assess how much time it takes to execute the R code in this document until the end:
t0 <- proc.time()
Let us consider a sample of $T$ observations taken sequentially in time, at equal intervals (lag). By modelling it mathematically, we assume that this time series is a realization from a stochastic process. In particular, we will assume this process to be stationary, i.e. constant mean and variance, and that a given observation only depends on the previous one, to which is added some white noise. This model is called the first-order autoregressive process, AR(1).
data: $\mathcal{D} = { x_1, \ldots, x_T }$
parameters: $\Theta = { \delta, \phi_1, \sigma^2 }$
likelihood: $p(\mathcal{D} | \Theta)$
It is often written as:
$\forall t \in {2,\ldots,T}, \; x_t = \delta + \phi_1 x_{t-1} + \epsilon_t$ where $\epsilon_t \sim \mathcal{N}(0, \sigma^2)$
To highlight the conditional independence, it can also be written as:
$\forall t \in {2,\ldots,T}, \; x_t | \delta, \phi_1, x_{t-1}, \ldots, x_1, \sigma^2 \; \sim \; \mathcal{N}(\phi_1 x_{t-1}, \sigma^2)$
We can also write it in a reduced form using the operators conventionally used in the field of time series analysis, notably, the backward shift operator, $B x_t = x_{t-1}$, and the autoregressive operator of order 1, $\phi(B) = 1 - \phi_1 B$. As a result:
$\phi(B) x_t = \delta + \epsilon_t$
Let us now derive the theoretical mean, $\mu$, of the process:
$\; E(x_t) = \delta + \phi_1 E(x_{t-1}) \Rightarrow \mu = \frac{\delta}{1 - \phi_1}$
as well as its theoretical variance, $v$:
$V(x_t) = \delta + \phi_1^2 V(x_{t-1}) + \sigma^2 \Rightarrow v = \frac{\sigma^2}{1 - \phi_1^2}$
From now on, let us assume $\delta = 0$, thus $\mu = 0$, with no loss of generality.
To interpret $\phi_1$, it also helps to write down the covariance, $\gamma_1$, and correlation, $\rho_1$, between two consecutive observations:
$\gamma_1 = Cov(x_{t-1}, x_t) = E(x_{t-1} x_t) = \phi_1 V(x_t) \Rightarrow \rho_1 = \phi_1 = \rho$
This can be generalized for two observations $s$ time intervals apart:
$\gamma_s = \phi_1 \gamma_{s-1} \Rightarrow \rho_s = \phi_1^s = \rho^s$
From all this, we can also write down the likelihood of all the data via the joint density of $\boldsymbol{x}$:
$p(\boldsymbol{x}) = p(x_1) p(x_2|x_1) \ldots p(x_T|x_{T-1})$ where $x_1 \sim \mathcal{N}(0, \frac{\sigma^2}{1 - \rho^2})$
This decomposition via the conditional independence of univariate Normals means that the joint density is a multivariate Normal:
$\boldsymbol{x} \sim \mathcal{N}_T(\boldsymbol{0}, \Sigma)$
where the covariance matrix can be written as a scaled correlation matrix:
\begin{equation} \Sigma = \frac{\sigma^2}{1 - \rho^2} \begin{pmatrix} 1 & \rho & \rho^2 & \cdots \ \rho & 1 & \rho & \cdots \ \rho^2 & \rho & 1 & \cdots \ \vdots & \vdots & \vdots & \ddots \ \end{pmatrix} \end{equation}
However, by looking at this covariance matrix, the conditional independence relationships aren't obvious. For this, it is much more straightforward to look at the precision matrix, $Q = \Sigma^{-1}$.
Using the formula to invert a $2 \times 2$ matrix, we have:
\begin{equation} \Sigma = \frac{\sigma^2}{1 - \rho^2} \begin{pmatrix} 1 & \rho \ \rho & 1 \ \end{pmatrix}
\Leftrightarrow Q = \frac{1}{\sigma^2} \begin{pmatrix} 1 & -\rho \ -\rho & 1 \ \end{pmatrix} \end{equation}
Using the formula to invert a $3 \times 3$ matrix, we have:
\begin{equation} \Sigma = \frac{\sigma^2}{1 - \rho^2} \begin{pmatrix} 1 & \rho & \rho^2 \ \rho & 1 & \rho \ \rho^2 & \rho & 1 \ \end{pmatrix}
\Leftrightarrow Q = \frac{1}{\sigma^2} \begin{pmatrix} 1 & -\rho & 0 \ -\rho & 1 + \rho^2 & -\rho \ 0 & -\rho & 1 \ \end{pmatrix} \end{equation}
In the general $T \times T$ case:
\begin{equation} Q = \frac{1}{\sigma^2} \begin{pmatrix} 1 & -\rho & 0 & & & & \ -\rho & 1 + \rho^2 & -\rho & & & & \ 0 & -\rho & 1 + \rho^2 & \cdots & & & \ & & \cdots & \cdots & & & \ & & & & -\rho & 1 + \rho^2 & -\rho \ & & & & 0 & -\rho & 1 \ \end{pmatrix} \end{equation}
Whereas the covariance matrix only reflects the marginal dependence structure, the precision matrix is tridiagonal, hence reflecting the fact that any given observation is conditionally dependent only on the previous and next observations. Such a sparse structure of the precision matrix allows using dedicated algorithms to perform fast and reliable inferences with a low-memory footprint.
Set the inputs:
N <- 1*10^2 # number of independent time series T <- 5*10^2 # length of each time series rho <- 0.7 # correlation between two successive observations sigma2 <- 5 # variance of the errors 1 / sigma2 # precision of the errors
arima.sim
set.seed(1859) x1 <- do.call(rbind, lapply(1:N, function(i){ arima.sim(model=list(ar=rho), n=T, sd=sqrt(sigma2)) })) rowMeans(apply(x1, 1, summary)) summary(apply(x1, 1, function(xi){ acf(xi, lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
set.seed(1859) ar1.sim.cond <- function(N, T, rho, sigma2){ x1s <- rnorm(n=N, mean=0, sd=sqrt(sigma2 / (1 - rho^2))) epsilons <- matrix(data=rnorm(n=N*(T-1), mean=0, sd=sqrt(sigma2)), nrow=N, ncol=T-1) x <- do.call(rbind, lapply(1:N, function(i){ xi <- rep(NA, T) xi[1] <- x1s[i] for(t in 2:T) xi[t] <- rho * xi[t-1] + epsilons[i,t-1] return(xi) })) return(x) } x2 <- ar1.sim.cond(N, T, rho, sigma2) rowMeans(apply(x2, 1, summary)) summary(apply(x2, 1, function(xi){ acf(xi, lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
set.seed(1859) ar1.sim.joint.cov <- function(N, T, rho, sigma2){ mu <- rep(0, T) Sigma <- rutilstimflutre::covMatAR1(T, rho, sigma2) print(object.size(Sigma)) ## x <- MASS::mvrnorm(n=N, mu=mu, Sigma=Sigma) A <- chol(Sigma) z <- matrix(data=rnorm(n=N*T, mean=0, sd=1), nrow=N, ncol=T) x <- do.call(rbind, lapply(1:N, function(i){ xi <- mu + t(z[i,]) %*% A return(xi) })) return(x) } x3 <- ar1.sim.joint.cov(N, T, rho, sigma2) rowMeans(apply(x3, 1, summary)) summary(apply(x3, 1, function(xi){ acf(xi, lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
set.seed(1859) ar1.sim.joint.prec <- function(N, T, rho, sigma2){ mu <- rep(0, T) Q <- rutilstimflutre::precMatAR1(T, rho, sigma2) print(object.size(Q)) x <- sparseMVN::rmvn.sparse(n=N, mu=mu, CH=Matrix::Cholesky(Q)) ## https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q4/019612.html ## x <- do.call(rbind, lapply(1:N, function(i){ ## L <- Diagonal(T) - bandSparse(T, T, -1, list(rep(rho, T-1))) ## L[1,1] <- sqrt(1 - rho^2) ## xi <- solve(L, epsilons[i,])@x * sqrt(sigma2) ## return(xi) ## })) return(x) } x4 <- ar1.sim.joint.prec(N, T, rho, sigma2) rowMeans(apply(x4, 1, summary)) summary(apply(x4, 1, function(xi){ acf(xi, lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
Note how much less memory is required for the sparse precision matrix compare to the non-sparse covariance matrix.
Let us take one particular sample, and perform inference:
x <- x1[1,] summary(x) var(x) 1 / var(x) plot.ts(x, las=1, main=paste0("AR(1) simulated with rho=", rho, " and sigma2=", sigma2)) abline(h=0, lty=2) acf(x, main=paste0("AR(1) simulated with rho=", rho, " and sigma2=", sigma2))
arima
and ar
(fit1a <- arima(x=x, order=c(1,0,0))) (fit1b <- ar(x=x, order.max=3))
gls
from nlme
fit2 <- nlme::gls(x ~ 1, data=data.frame(x=x), correlation=nlme::corAR1(fixed=FALSE), method="REML") summary(fit2) (v.hat <- fit2$sigma) Sigma.hat <- corMatrix(fit2$modelStruct$corStruct) rho.hat <- Sigma.hat[1,2] (sigma2.hat <- v.hat^2 * (1 - rho.hat^2))
inla
The R INLA package (Integrated nested Laplace approximation) allows to fit hierarchical Gaussian Markov random fields (GMRF) of a very generic form, $y_j | \eta_j, \boldsymbol{\theta}_1 \sim \pi$ (see the Intro). Most notably, the predictors, $\boldsymbol{\eta}$, can be modelled via numerous latent processes.
However, in our case, the AR(1) isn't latent, that is, there is no additional noise beside the $\epsilon_t$'s from the AR(1) itself.
Therefore, to fit our model with the inla
function, we need to "rephrase" it as $y_t = x_t + w_t$ with $w_t \sim \mathcal{N}(0, \sigma_w^2)$ assuming that $\sigma_w^2$ is known (fixed) and small, i.e. the precision is large, e.g. 10 on a log scale.
Moreover, inla
parametrizes the AR(1) via the marginal precision, $(1 - \rho^2) / \sigma^2$ (see the doc).
We hence need some manipulations to retrieve the posterior mean of $\sigma^2$.
fit3 <- inla(formula=y ~ 1 + f(t, model="ar1"), data=data.frame(y=x, t=1:length(x)), family="gaussian", control.family=list(hyper=list(theta=list(initial=10, fixed=TRUE))), control.compute=list(mlik=TRUE, dic=TRUE, waic=TRUE, cpo=TRUE)) summary(fit3) (sigma2.post.mean <- (1 - fit3$summary.hyperpar["Rho for t", "mean"]^2) / fit3$summary.hyperpar["Precision for t", "mean"]) (1 - fit3$summary.hyperpar["Rho for t", "0.975quant"]^2) / fit3$summary.hyperpar["Precision for t", "0.975quant"] (1 - fit3$summary.hyperpar["Rho for t", "0.025quant"]^2) / fit3$summary.hyperpar["Precision for t", "0.025quant"] fit3$dic$mean.deviance
We can also fit with inla
a model assuming the latent variables are independent, and compare the fit with the one above:
fit3iid <- inla(formula=y ~ 1 + f(t, model="iid"), data=data.frame(y=x, t=1:length(x)), family="gaussian", control.family=list(hyper=list(theta=list(initial=10, fixed=TRUE))), control.compute=list(mlik=TRUE, dic=TRUE, waic=TRUE, cpo=TRUE)) summary(fit3iid) 1 / fit3iid$summary.hyperpar["Precision for t", "mean"] fit3iid$dic$mean.deviance
Surprisingly, the DIC and WAIC are the same for both models.
We now take several simulated time series:
x <- x1[1:10,] # only take 10 time series idx <- sample.int(n=nrow(x), size=5) # plot a subset of them plot.ts(t(x[idx,]), las=1, plot.type="single", col=1:nrow(x[idx,]), main=paste0("AR(1) simulated with rho=", rho, " and sigma2=", sigma2)) abline(h=0, lty=2)
The likelihood can be written as:
[ \forall i \in {1,\ldots,N} \text{ and } t \in {2,\ldots,T}, \; x_{i,t} = \rho x_{i,t-1} + \epsilon_{i,t} \text{ where } \epsilon_{i,t} \sim \mathcal{N}(0, \sigma^2) ]
with $\forall i, \; x_{i,1} \sim \mathcal{N}(0, \frac{\sigma^2}{1 - \rho^2})$
gls
from nlme
tmp <- data.frame(x=c(t(x)), i=factor(rep(1:nrow(x), each=ncol(x)))) fit1 <- nlme::gls(x ~ 1, data=tmp, correlation=nlme::corAR1(form=~1|i, fixed=FALSE), method="REML") summary(fit1)
inla
tmp <- data.frame(x=c(t(x)), i=rep(1:nrow(x), each=ncol(x)), t=rep(1:ncol(x), nrow(x))) fit2 <- inla(formula=x ~ 1 + f(t, model="ar1", replicate=i), data=tmp, family="gaussian", control.family=list(initial=10, fixed=TRUE), control.compute=list(mlik=TRUE, dic=TRUE, waic=TRUE, cpo=TRUE)) summary(fit2) (sigma2.post.mean <- (1 - fit2$summary.hyperpar["Rho for t", "mean"]^2) / fit2$summary.hyperpar["Precision for t", "mean"])
Let us now consider a sample of $N$ observations, all taken at the same point in time, but at different locations in space. To each observation is associated a pair of geographical coordinates, $(r,c)$. Let us assume those to be points on a rectangular grid (lattice) of $R$ rows and $C$ columns, with $r$ (resp. $c$) indexing the rows (resp. columns). Assuming that two nearby locations are more likely to be similar than two distant ones, that rows are equally distant (lag $l_r$), and columns as well (lag $l_c$, possibly different than $l_r$), we can model our sample via a two-dimensional, second-order stationary process. A simple choice is the separable AR(1) $\times$ AR(1) process. It also happens to be reflection-symmetric.
data: $\mathcal{D} = { x_{1,1}, \ldots, x_{R,1}, \ldots, x_{r,c}, \ldots, x_{R,C} }$ assuming they have been centered beforehand
parameters: $\Theta = { \rho_r, \rho_c, \sigma^2 }$
likelihood: $p(\mathcal{D} | \Theta)$
The whole data set is naturally visualized as an $R \times C$ matrix, $X$, and the likelihood of any element can be written as:
$\forall r,c > 1, \; x_{r,c} = \rho_r x_{r-1,c} + \rho_c x_{r,c-1} - \rho_r \rho_c x_{r-1,c-1} + \epsilon_{r,c}$ with $\epsilon_{r,c} \sim \mathcal{N}(0, \sigma^2)$
For a full specification, see Martin (1996) which sets $x_{1,1} = \sqrt{\sigma_X^2 / \sigma_\epsilon^2} \epsilon_{1,1}$, and uses:
$\forall c > 1, \; x_{1,c} = \rho_c x_{1,c-1} + \sqrt{(1 - \rho_c^2) \sigma_X^2 / \sigma_\epsilon^2} \epsilon_{1,c}$
$\forall r > 1, \; x_{r,1} = \rho_r x_{r-1,1} + \sqrt{(1 - \rho_r^2) \sigma_X^2 / \sigma_\epsilon^2} \epsilon_{r,1}$
Applying the vec operator (stacking the columns), we obtain the vector $\boldsymbol{x} = vec(X)$ of length $N$, indexed by $i$. Written in a reduced form, the model is:
$\phi(B^R) \phi(B) x_i = \epsilon_i$
Writing the likelihood using the multivariate Normal, taking account of the separability of the process:
$\boldsymbol{x} \sim \mathcal{N}_N(0, \sigma^2 \times \Sigma(\rho_c) \otimes \Sigma(\rho_r))$
Set the inputs:
N <- 1*10^2 # number of independent spatial series R <- 3*10^1 # number of rows of each time series C <- 7*10^1 # number of columns of each time series R * C # total number of cells rho.r <- 0.9 # correlation between two successive rows rho.c <- 0.9 # correlation between two successive columns sigma2 <- 5 # variance of the errors 1 / sigma2 # precision of the errors
set.seed(1859) ar1xar1.sim.cond <- simulAr1Ar1 # inside package
Check the code by simulating with only column correlation:
X1 <- ar1xar1.sim.cond(n=N, R=2, C=500, sigma.e.2=sigma2, rho.r=0, rho.c=rho.c) prettyPrintBetterSummary(apply(X1, 3, function(Xi){ acf(Xi[1,], lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
Idem, with only row correlation:
X1 <- ar1xar1.sim.cond(n=N, R=500, C=2, sigma.e.2=sigma2, rho.r=rho.r, rho.c=0.0) prettyPrintBetterSummary(apply(X1, 3, function(Xi){ acf(Xi[,1], lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
Now with both row and column correlations:
X1 <- ar1xar1.sim.cond(n=N, R=R, C=C, sigma.e.2=sigma2, rho.r=rho.r, rho.c=rho.c) prettyPrintBetterSummary(apply(X1, 3, function(Xi){ acf(Xi[1,], lag.max=1, type="correlation", plot=FALSE)$acf[2] })) prettyPrintBetterSummary(apply(X1, 3, function(Xi){ acf(Xi[,1], lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
set.seed(1859) ar1xar1.sim.mv <- function(N, R, C, sigma2, rho.r, rho.c){ stopifnot(R > 1, C > 1, abs(rho.r) <= 1, abs(rho.c) <= 1, sigma2 > 0) ## ... TODO ... }
Check the code by simulating with only column correlation:
X2 <- ar1xar1.sim.mv(N, R=2, C=500, sigma2, rho.r=0, rho.c) prettyPrintBetterSummary(apply(X2, 3, function(Xi){ acf(Xi[1,], lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
Idem, with only row correlation:
X2 <- ar1xar1.sim.mv(N, R=500, C=2, sigma2, rho.r, rho.c=0.0) prettyPrintBetterSummary(apply(X2, 3, function(Xi){ acf(Xi[,1], lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
Now with both row and column correlations:
X2 <- ar1xar1.sim.mv(N, R, C, sigma2, rho.r, rho.c) prettyPrintBetterSummary(apply(X2, 3, function(Xi){ acf(Xi[1,], lag.max=1, type="correlation", plot=FALSE)$acf[2] })) prettyPrintBetterSummary(apply(X2, 3, function(Xi){ acf(Xi[,1], lag.max=1, type="correlation", plot=FALSE)$acf[2] }))
Let us take one particular sample, and perform inference:
X <- X1[,,1] dim(X) X[1:3, 1:6]
Make coordinates (note that columns are along the $x$-axis and rows along the $y$-axis):
coords <- as.matrix(expand.grid(1:R, 1:C)) colnames(coords) <- c("y", "x") coords <- coords[,c("x","y")] dim(coords) head(coords) all(max(coords[,"x"]) == C, max(coords[,"y"]) == R)
Reformat the data accordingly:
dat <- data.frame(var1=c(X)) dim(dat)
Make a spatial data structure, and plot it:
X.sp <- SpatialPointsDataFrame(coords=coords, data=dat) summary(X.sp) spplot(obj=X.sp, scales=list(draw=TRUE), xlab="columns", ylab="rows", main=paste0("AR(1)xAR(1) with rho.r=", rho.r, ", rho.c=", rho.c, " and sigma2=", sigma2), key.space="right", aspect="fill")
Look at the variogram estimated from the data:
vg.1 <- gstat::variogram(object=var1 ~ 1, data=X.sp) plot(x=vg.1$dist, y=vg.1$gamma, main="Empirical variogram", ylim=c(0, max(vg.1$gamma)))
gstat
Fit a Matérn model for the random field:
(vg.1.fit <- gstat::fit.variogram(object=vg.1, model=vgm(model="Ste", nugget=NA), fit.kappa=seq(0.3, 0.6, 0.05))) plot(vg.1, vg.1.fit, main="", col="blue", key=list(space="top", lines=list(col="blue"), text=list("fit of Matérn model")))
inla
TODO
lm
head(as.data.frame(X.sp)) fit_x_y <- lm(var1 ~ x + y, data=as.data.frame(X.sp)) fit_x_y_xy <- lm(var1 ~ x + y + x:y, data=as.data.frame(X.sp)) anova(fit_x_y, fit_x_y_xy) fit_x_y_x2_y2 <- lm(var1 ~ x + y + I(x^2) + I(y^2), data=as.data.frame(X.sp)) anova(fit_x_y, fit_x_y_xy, fit_x_y_x2_y2) fit_x_y_x2_y2_x2y2<- lm(var1 ~ x + y + I(x^2) + I(y^2) + I(x^2):I(y^2), data=as.data.frame(X.sp)) anova(fit_x_y, fit_x_y_xy, fit_x_y_x2_y2, fit_x_y_x2_y2_x2y2)
Marie Denis (CIRAD)
Box, Jenkins & Reinsel (2008): "Time Series Analysis: Forecasting and Control" (4th edition)
Rue & Held (2005): "Gaussian Markov Random Fields: Theory and Applications", chapter 1
Congdon (2010): "Applied Bayesian Hierarchical Methods", section 4.2.2 Low order autoregressive models
Martin (1996): "Some results on unilateral ARMA lattice processes"
Cullis & Gleeson (1991): "Spatial analysis of field experiments: an extension to two dimensions"
Scaccia & Martin (2002): "Testing for simplification in spatial models"
Stein (1999): "Interpolation of spatial data"
t1 <- proc.time() t1 - t0 print(sessionInfo(), locale=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.