Preamble

License: CC BY-SA 4.0

External packages:

options(digits=5)
suppressPackageStartupMessages(library(rutilstimflutre))
## library(MASS)
suppressPackageStartupMessages(library(Matrix))
suppressPackageStartupMessages(library(sparseMVN))
suppressPackageStartupMessages(library(nlme))
suppressPackageStartupMessages(library(INLA))
suppressPackageStartupMessages(library(sp))
suppressPackageStartupMessages(library(gstat))

This R chunk is used to assess how much time it takes to execute the R code in this document until the end:

t0 <- proc.time()

1D (time)

Model

Let us consider a sample of $T$ observations taken sequentially in time, at equal intervals (lag). By modelling it mathematically, we assume that this time series is a realization from a stochastic process. In particular, we will assume this process to be stationary, i.e. constant mean and variance, and that a given observation only depends on the previous one, to which is added some white noise. This model is called the first-order autoregressive process, AR(1).

It is often written as:

$\forall t \in {2,\ldots,T}, \; x_t = \delta + \phi_1 x_{t-1} + \epsilon_t$ where $\epsilon_t \sim \mathcal{N}(0, \sigma^2)$

To highlight the conditional independence, it can also be written as:

$\forall t \in {2,\ldots,T}, \; x_t | \delta, \phi_1, x_{t-1}, \ldots, x_1, \sigma^2 \; \sim \; \mathcal{N}(\phi_1 x_{t-1}, \sigma^2)$

We can also write it in a reduced form using the operators conventionally used in the field of time series analysis, notably, the backward shift operator, $B x_t = x_{t-1}$, and the autoregressive operator of order 1, $\phi(B) = 1 - \phi_1 B$. As a result:

$\phi(B) x_t = \delta + \epsilon_t$

Let us now derive the theoretical mean, $\mu$, of the process:

$\; E(x_t) = \delta + \phi_1 E(x_{t-1}) \Rightarrow \mu = \frac{\delta}{1 - \phi_1}$

as well as its theoretical variance, $v$:

$V(x_t) = \delta + \phi_1^2 V(x_{t-1}) + \sigma^2 \Rightarrow v = \frac{\sigma^2}{1 - \phi_1^2}$

From now on, let us assume $\delta = 0$, thus $\mu = 0$, with no loss of generality.

To interpret $\phi_1$, it also helps to write down the covariance, $\gamma_1$, and correlation, $\rho_1$, between two consecutive observations:

$\gamma_1 = Cov(x_{t-1}, x_t) = E(x_{t-1} x_t) = \phi_1 V(x_t) \Rightarrow \rho_1 = \phi_1 = \rho$

This can be generalized for two observations $s$ time intervals apart:

$\gamma_s = \phi_1 \gamma_{s-1} \Rightarrow \rho_s = \phi_1^s = \rho^s$

From all this, we can also write down the likelihood of all the data via the joint density of $\boldsymbol{x}$:

$p(\boldsymbol{x}) = p(x_1) p(x_2|x_1) \ldots p(x_T|x_{T-1})$ where $x_1 \sim \mathcal{N}(0, \frac{\sigma^2}{1 - \rho^2})$

This decomposition via the conditional independence of univariate Normals means that the joint density is a multivariate Normal:

$\boldsymbol{x} \sim \mathcal{N}_T(\boldsymbol{0}, \Sigma)$

where the covariance matrix can be written as a scaled correlation matrix:

\begin{equation} \Sigma = \frac{\sigma^2}{1 - \rho^2} \begin{pmatrix} 1 & \rho & \rho^2 & \cdots \ \rho & 1 & \rho & \cdots \ \rho^2 & \rho & 1 & \cdots \ \vdots & \vdots & \vdots & \ddots \ \end{pmatrix} \end{equation}

However, by looking at this covariance matrix, the conditional independence relationships aren't obvious. For this, it is much more straightforward to look at the precision matrix, $Q = \Sigma^{-1}$.

Using the formula to invert a $2 \times 2$ matrix, we have:

\begin{equation} \Sigma = \frac{\sigma^2}{1 - \rho^2} \begin{pmatrix} 1 & \rho \ \rho & 1 \ \end{pmatrix}

\Leftrightarrow Q = \frac{1}{\sigma^2} \begin{pmatrix} 1 & -\rho \ -\rho & 1 \ \end{pmatrix} \end{equation}

Using the formula to invert a $3 \times 3$ matrix, we have:

\begin{equation} \Sigma = \frac{\sigma^2}{1 - \rho^2} \begin{pmatrix} 1 & \rho & \rho^2 \ \rho & 1 & \rho \ \rho^2 & \rho & 1 \ \end{pmatrix}

\Leftrightarrow Q = \frac{1}{\sigma^2} \begin{pmatrix} 1 & -\rho & 0 \ -\rho & 1 + \rho^2 & -\rho \ 0 & -\rho & 1 \ \end{pmatrix} \end{equation}

In the general $T \times T$ case:

\begin{equation} Q = \frac{1}{\sigma^2} \begin{pmatrix} 1 & -\rho & 0 & & & & \ -\rho & 1 + \rho^2 & -\rho & & & & \ 0 & -\rho & 1 + \rho^2 & \cdots & & & \ & & \cdots & \cdots & & & \ & & & & -\rho & 1 + \rho^2 & -\rho \ & & & & 0 & -\rho & 1 \ \end{pmatrix} \end{equation}

Whereas the covariance matrix only reflects the marginal dependence structure, the precision matrix is tridiagonal, hence reflecting the fact that any given observation is conditionally dependent only on the previous and next observations. Such a sparse structure of the precision matrix allows using dedicated algorithms to perform fast and reliable inferences with a low-memory footprint.

Simulations

Set the inputs:

N <- 1*10^2 # number of independent time series
T <- 5*10^2 # length of each time series
rho <- 0.7  # correlation between two successive observations
sigma2 <- 5 # variance of the errors
1 / sigma2  # precision of the errors

Via arima.sim

set.seed(1859)
x1 <- do.call(rbind, lapply(1:N, function(i){
  arima.sim(model=list(ar=rho), n=T, sd=sqrt(sigma2))
}))
rowMeans(apply(x1, 1, summary))
summary(apply(x1, 1, function(xi){
  acf(xi, lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Via conditional univariate Normals

set.seed(1859)
ar1.sim.cond <- function(N, T, rho, sigma2){
  x1s <- rnorm(n=N, mean=0, sd=sqrt(sigma2 / (1 - rho^2)))
  epsilons <- matrix(data=rnorm(n=N*(T-1), mean=0, sd=sqrt(sigma2)),
                     nrow=N, ncol=T-1)
  x <- do.call(rbind, lapply(1:N, function(i){
    xi <- rep(NA, T)
    xi[1] <- x1s[i]
    for(t in 2:T)
      xi[t] <- rho * xi[t-1] + epsilons[i,t-1]
    return(xi)
  }))
  return(x)
}
x2 <- ar1.sim.cond(N, T, rho, sigma2)
rowMeans(apply(x2, 1, summary))
summary(apply(x2, 1, function(xi){
  acf(xi, lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Via multivariate Normal with covariance

set.seed(1859)
ar1.sim.joint.cov <- function(N, T, rho, sigma2){
  mu <- rep(0, T)
  Sigma <- rutilstimflutre::covMatAR1(T, rho, sigma2)
  print(object.size(Sigma))
  ## x <- MASS::mvrnorm(n=N, mu=mu, Sigma=Sigma)
  A <- chol(Sigma)
  z <- matrix(data=rnorm(n=N*T, mean=0, sd=1), nrow=N, ncol=T)
  x <- do.call(rbind, lapply(1:N, function(i){
    xi <- mu + t(z[i,]) %*% A
    return(xi)
  }))
  return(x)
}
x3 <- ar1.sim.joint.cov(N, T, rho, sigma2)
rowMeans(apply(x3, 1, summary))
summary(apply(x3, 1, function(xi){
  acf(xi, lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Via multivariate Normal with precision

set.seed(1859)
ar1.sim.joint.prec <- function(N, T, rho, sigma2){
  mu <- rep(0, T)
  Q <- rutilstimflutre::precMatAR1(T, rho, sigma2)
  print(object.size(Q))
  x <- sparseMVN::rmvn.sparse(n=N, mu=mu, CH=Matrix::Cholesky(Q))
  ## https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q4/019612.html
  ## x <- do.call(rbind, lapply(1:N, function(i){
  ##   L <- Diagonal(T) - bandSparse(T, T, -1, list(rep(rho, T-1)))
  ##   L[1,1] <- sqrt(1 - rho^2)
  ##   xi <- solve(L, epsilons[i,])@x * sqrt(sigma2)
  ##   return(xi)
  ## }))
  return(x)
}
x4 <- ar1.sim.joint.prec(N, T, rho, sigma2)
rowMeans(apply(x4, 1, summary))
summary(apply(x4, 1, function(xi){
  acf(xi, lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Note how much less memory is required for the sparse precision matrix compare to the non-sparse covariance matrix.

Inference

Fit a single time series

Let us take one particular sample, and perform inference:

x <- x1[1,]
summary(x)
var(x)
1 / var(x)
plot.ts(x, las=1,
        main=paste0("AR(1) simulated with rho=", rho, " and sigma2=", sigma2))
abline(h=0, lty=2)
acf(x, main=paste0("AR(1) simulated with rho=", rho, " and sigma2=", sigma2))

Via arima and ar

(fit1a <- arima(x=x, order=c(1,0,0)))
(fit1b <- ar(x=x, order.max=3))

Via gls from nlme

fit2 <- nlme::gls(x ~ 1, data=data.frame(x=x),
                   correlation=nlme::corAR1(fixed=FALSE),
                   method="REML")
summary(fit2)
(v.hat <- fit2$sigma)
Sigma.hat <- corMatrix(fit2$modelStruct$corStruct)
rho.hat <- Sigma.hat[1,2]
(sigma2.hat <- v.hat^2 * (1 - rho.hat^2))

Via inla

The R INLA package (Integrated nested Laplace approximation) allows to fit hierarchical Gaussian Markov random fields (GMRF) of a very generic form, $y_j | \eta_j, \boldsymbol{\theta}_1 \sim \pi$ (see the Intro). Most notably, the predictors, $\boldsymbol{\eta}$, can be modelled via numerous latent processes.

However, in our case, the AR(1) isn't latent, that is, there is no additional noise beside the $\epsilon_t$'s from the AR(1) itself. Therefore, to fit our model with the inla function, we need to "rephrase" it as $y_t = x_t + w_t$ with $w_t \sim \mathcal{N}(0, \sigma_w^2)$ assuming that $\sigma_w^2$ is known (fixed) and small, i.e. the precision is large, e.g. 10 on a log scale.

Moreover, inla parametrizes the AR(1) via the marginal precision, $(1 - \rho^2) / \sigma^2$ (see the doc). We hence need some manipulations to retrieve the posterior mean of $\sigma^2$.

fit3 <- inla(formula=y ~ 1 + f(t, model="ar1"),
             data=data.frame(y=x, t=1:length(x)),
             family="gaussian",
             control.family=list(hyper=list(theta=list(initial=10,
                                                       fixed=TRUE))),
             control.compute=list(mlik=TRUE, dic=TRUE, waic=TRUE, cpo=TRUE))
summary(fit3)
(sigma2.post.mean <- (1 - fit3$summary.hyperpar["Rho for t", "mean"]^2) /
   fit3$summary.hyperpar["Precision for t", "mean"])
(1 - fit3$summary.hyperpar["Rho for t", "0.975quant"]^2) /
  fit3$summary.hyperpar["Precision for t", "0.975quant"]
(1 - fit3$summary.hyperpar["Rho for t", "0.025quant"]^2) /
  fit3$summary.hyperpar["Precision for t", "0.025quant"]
fit3$dic$mean.deviance

We can also fit with inla a model assuming the latent variables are independent, and compare the fit with the one above:

fit3iid <- inla(formula=y ~ 1 + f(t, model="iid"),
                data=data.frame(y=x, t=1:length(x)),
                family="gaussian",
                control.family=list(hyper=list(theta=list(initial=10,
                                                          fixed=TRUE))),
                control.compute=list(mlik=TRUE, dic=TRUE, waic=TRUE, cpo=TRUE))
summary(fit3iid)
1 / fit3iid$summary.hyperpar["Precision for t", "mean"]
fit3iid$dic$mean.deviance

Surprisingly, the DIC and WAIC are the same for both models.

Fit multiple time series jointly

We now take several simulated time series:

x <- x1[1:10,] # only take 10 time series
idx <- sample.int(n=nrow(x), size=5) # plot a subset of them
plot.ts(t(x[idx,]), las=1, plot.type="single", col=1:nrow(x[idx,]),
        main=paste0("AR(1) simulated with rho=", rho, " and sigma2=", sigma2))
abline(h=0, lty=2)

The likelihood can be written as:

[ \forall i \in {1,\ldots,N} \text{ and } t \in {2,\ldots,T}, \; x_{i,t} = \rho x_{i,t-1} + \epsilon_{i,t} \text{ where } \epsilon_{i,t} \sim \mathcal{N}(0, \sigma^2) ]

with $\forall i, \; x_{i,1} \sim \mathcal{N}(0, \frac{\sigma^2}{1 - \rho^2})$

Via gls from nlme

tmp <- data.frame(x=c(t(x)),
                  i=factor(rep(1:nrow(x), each=ncol(x))))
fit1 <- nlme::gls(x ~ 1, data=tmp,
                  correlation=nlme::corAR1(form=~1|i, fixed=FALSE),
                  method="REML")
summary(fit1)

Via inla

tmp <- data.frame(x=c(t(x)),
                  i=rep(1:nrow(x), each=ncol(x)),
                  t=rep(1:ncol(x), nrow(x)))
fit2 <- inla(formula=x ~ 1 + f(t, model="ar1", replicate=i),
             data=tmp,
             family="gaussian",
             control.family=list(initial=10, fixed=TRUE),
             control.compute=list(mlik=TRUE, dic=TRUE, waic=TRUE, cpo=TRUE))
summary(fit2)
(sigma2.post.mean <- (1 - fit2$summary.hyperpar["Rho for t", "mean"]^2) /
   fit2$summary.hyperpar["Precision for t", "mean"])

2D (space)

Model

Let us now consider a sample of $N$ observations, all taken at the same point in time, but at different locations in space. To each observation is associated a pair of geographical coordinates, $(r,c)$. Let us assume those to be points on a rectangular grid (lattice) of $R$ rows and $C$ columns, with $r$ (resp. $c$) indexing the rows (resp. columns). Assuming that two nearby locations are more likely to be similar than two distant ones, that rows are equally distant (lag $l_r$), and columns as well (lag $l_c$, possibly different than $l_r$), we can model our sample via a two-dimensional, second-order stationary process. A simple choice is the separable AR(1) $\times$ AR(1) process. It also happens to be reflection-symmetric.

The whole data set is naturally visualized as an $R \times C$ matrix, $X$, and the likelihood of any element can be written as:

$\forall r,c > 1, \; x_{r,c} = \rho_r x_{r-1,c} + \rho_c x_{r,c-1} - \rho_r \rho_c x_{r-1,c-1} + \epsilon_{r,c}$ with $\epsilon_{r,c} \sim \mathcal{N}(0, \sigma^2)$

For a full specification, see Martin (1996) which sets $x_{1,1} = \sqrt{\sigma_X^2 / \sigma_\epsilon^2} \epsilon_{1,1}$, and uses:

Applying the vec operator (stacking the columns), we obtain the vector $\boldsymbol{x} = vec(X)$ of length $N$, indexed by $i$. Written in a reduced form, the model is:

$\phi(B^R) \phi(B) x_i = \epsilon_i$

Writing the likelihood using the multivariate Normal, taking account of the separability of the process:

$\boldsymbol{x} \sim \mathcal{N}_N(0, \sigma^2 \times \Sigma(\rho_c) \otimes \Sigma(\rho_r))$

Simulations

Set the inputs:

N <- 1*10^2   # number of independent spatial series
R <- 3*10^1   # number of rows of each time series
C <- 7*10^1   # number of columns of each time series
R * C         # total number of cells
rho.r <- 0.9  # correlation between two successive rows
rho.c <- 0.9  # correlation between two successive columns
sigma2 <- 5   # variance of the errors
1 / sigma2    # precision of the errors

Via conditional univariate Normals

set.seed(1859)
ar1xar1.sim.cond <- simulAr1Ar1 # inside package

Check the code by simulating with only column correlation:

X1 <- ar1xar1.sim.cond(n=N, R=2, C=500, sigma.e.2=sigma2, rho.r=0, rho.c=rho.c)
prettyPrintBetterSummary(apply(X1, 3, function(Xi){
  acf(Xi[1,], lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Idem, with only row correlation:

X1 <- ar1xar1.sim.cond(n=N, R=500, C=2, sigma.e.2=sigma2, rho.r=rho.r, rho.c=0.0)
prettyPrintBetterSummary(apply(X1, 3, function(Xi){
  acf(Xi[,1], lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Now with both row and column correlations:

X1 <- ar1xar1.sim.cond(n=N, R=R, C=C, sigma.e.2=sigma2, rho.r=rho.r, rho.c=rho.c)
prettyPrintBetterSummary(apply(X1, 3, function(Xi){
  acf(Xi[1,], lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))
prettyPrintBetterSummary(apply(X1, 3, function(Xi){
  acf(Xi[,1], lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Via multivariate Normals

set.seed(1859)
ar1xar1.sim.mv <- function(N, R, C, sigma2, rho.r, rho.c){
  stopifnot(R > 1, C > 1, abs(rho.r) <= 1, abs(rho.c) <= 1, sigma2 > 0)
  ## ... TODO ...
}

Check the code by simulating with only column correlation:

X2 <- ar1xar1.sim.mv(N, R=2, C=500, sigma2, rho.r=0, rho.c)
prettyPrintBetterSummary(apply(X2, 3, function(Xi){
  acf(Xi[1,], lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Idem, with only row correlation:

X2 <- ar1xar1.sim.mv(N, R=500, C=2, sigma2, rho.r, rho.c=0.0)
prettyPrintBetterSummary(apply(X2, 3, function(Xi){
  acf(Xi[,1], lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Now with both row and column correlations:

X2 <- ar1xar1.sim.mv(N, R, C, sigma2, rho.r, rho.c)
prettyPrintBetterSummary(apply(X2, 3, function(Xi){
  acf(Xi[1,], lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))
prettyPrintBetterSummary(apply(X2, 3, function(Xi){
  acf(Xi[,1], lag.max=1, type="correlation", plot=FALSE)$acf[2]
}))

Inference

Let us take one particular sample, and perform inference:

X <- X1[,,1]
dim(X)
X[1:3, 1:6]

Make coordinates (note that columns are along the $x$-axis and rows along the $y$-axis):

coords <- as.matrix(expand.grid(1:R, 1:C))
colnames(coords) <- c("y", "x")
coords <- coords[,c("x","y")]
dim(coords)
head(coords)
all(max(coords[,"x"]) == C, max(coords[,"y"]) == R)

Reformat the data accordingly:

dat <- data.frame(var1=c(X))
dim(dat)

Make a spatial data structure, and plot it:

X.sp <- SpatialPointsDataFrame(coords=coords, data=dat)
summary(X.sp)
spplot(obj=X.sp, scales=list(draw=TRUE),
       xlab="columns", ylab="rows",
       main=paste0("AR(1)xAR(1) with rho.r=", rho.r, ", rho.c=", rho.c,
                   " and sigma2=", sigma2),
       key.space="right", aspect="fill")

Look at the variogram estimated from the data:

vg.1 <- gstat::variogram(object=var1 ~ 1, data=X.sp)
plot(x=vg.1$dist, y=vg.1$gamma, main="Empirical variogram",
     ylim=c(0, max(vg.1$gamma)))

Via gstat

Fit a Matérn model for the random field:

(vg.1.fit <- gstat::fit.variogram(object=vg.1,
                                  model=vgm(model="Ste", nugget=NA),
                                  fit.kappa=seq(0.3, 0.6, 0.05)))
plot(vg.1, vg.1.fit, main="", col="blue",
     key=list(space="top", lines=list(col="blue"),
              text=list("fit of Matérn model")))

Via inla

TODO

Via lm

head(as.data.frame(X.sp))
fit_x_y <- lm(var1 ~ x + y, data=as.data.frame(X.sp))
fit_x_y_xy <- lm(var1 ~ x + y + x:y, data=as.data.frame(X.sp))
anova(fit_x_y, fit_x_y_xy)
fit_x_y_x2_y2 <- lm(var1 ~ x + y + I(x^2) + I(y^2), data=as.data.frame(X.sp))
anova(fit_x_y, fit_x_y_xy, fit_x_y_x2_y2)
fit_x_y_x2_y2_x2y2<- lm(var1 ~ x + y + I(x^2) + I(y^2) + I(x^2):I(y^2), data=as.data.frame(X.sp))
anova(fit_x_y, fit_x_y_xy, fit_x_y_x2_y2, fit_x_y_x2_y2_x2y2)

Acknowledgments

Marie Denis (CIRAD)

References

Appendix

t1 <- proc.time()
t1 - t0
print(sessionInfo(), locale=FALSE)


timflutre/rutilstimflutre documentation built on Feb. 7, 2024, 8:17 a.m.