PCA_TS: Principal component analysis for vector time series
In HDTSA: High Dimensional Time Series Analysis Tools

PCA_TS

R Documentation

Principal component analysis for vector time series

Description

PCA_TS() seeks for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries:

{\bf y}_t={\bf Ax}_t,

where {\bf x}_t is an unobservable p \times 1 weakly stationary time series consisting of q\ (\geq 1) both contemporaneously and serially uncorrelated subseries. See Chang, Guo and Yao (2018).

Usage

PCA_TS(
  Y,
  lag.k = 5,
  opt = 1,
  permutation = c("max", "fdr"),
  thresh = FALSE,
  delta = 2 * sqrt(log(ncol(Y))/nrow(Y)),
  prewhiten = TRUE,
  m = NULL,
  beta,
  control = list()
)

Arguments

`Y`	An `n \times p` data matrix `{\bf Y} = ({\bf y}_1, \dots , {\bf y}_n )'`, where `n` is the number of the observations of the `p \times 1` time series `\{{\bf y}_t\}_{t=1}^n`. The procedure will first normalize `{\bf y}_t` as `\hat{{\bf V}}^{-1/2}{\bf y}_t`, where `\hat{{\bf V}}` is an estimator for covariance of `{\bf y}_t`. See details below for the selection of `\hat{{\bf V}}^{-1}`.
`lag.k`	The time lag `K` used to calculate the nonnegative definte matrix `\hat{{\bf W}}_y`: `\hat{\mathbf{W}}_y\ =\ \mathbf{I}_p+ \sum_{k=1}^{K} T_\delta \{\hat{\mathbf{\Sigma}}_y(k)\} T_\delta \{\hat{\mathbf{\Sigma}}_y(k)\}',` where `\hat{\bf \Sigma}_y(k)` is the sample autocovariance of `\hat{{\bf V}}^{-1/2}{\bf y}_t` at lag `k` and `T_\delta(\cdot)` is a threshold operator with the threshold level `\delta \geq 0`. See 'Details'. The default is 5.
`opt`	An option used to choose which method will be implemented to get a consistent estimate `\hat{\bf V}` (or `\hat{\bf V}^{-1}`) for the covariance (precision) matrix of `{\bf y}_t`. If `opt = 1`, `\hat{\bf V}` will be defined as the sample covariance matrix. If `opt = 2`, the precision matrix `\hat{\bf V}^{-1}` will be calculated by using the function `clime()` of clime (Cai, Liu and Luo, 2011) with the arguments passed by `control`.
`permutation`	The method of permutation procedure to assign the components of `\hat{\bf z}_t` to different groups [See Section 2.2.1 in Chang, Guo and Yao (2018)]. Available options include: `"max"` (the default) for the maximum cross correlation method and `"fdr"` for the false discovery rate procedure based on multiple tests. See Sections 2.2.2 and 2.2.3 in Chang, Guo and Yao (2018) for more information.
`thresh`	Logical. If `thresh = FALSE` (the default), no thresholding will be applied to estimate `\hat{\bf W}_y`. If `thresh = TRUE`, the argument `delta` is used to specify the threshold level `\delta`.
`delta`	The value of the threshold level `\delta`. The default is `\delta = 2 \sqrt{n^{-1}\log p}`.
`prewhiten`	Logical. If `TRUE` (the default), we prewhiten each transformed component series of `\hat{\bf z}_t` [See Section 2.2.1 in Chang, Guo and Yao (2018)] by fitting a univariate AR model with the order between 0 and 5 determined by AIC. If `FALSE`, then the prewhiten procedure will not be performed.
`m`	A positive integer used in the permutation procedure [See (2.10) in Chang, Guo and Yao (2018)]. The default is 10.
`beta`	The error rate used in the permutation procedure[See (2.16) in Chang, Guo and Yao (2018)] when `permutation = "fdr"`.
`control`	A list of control arguments. See ‘Details’.

Details

The threshold operator T_\delta(\cdot) is defined as T_\delta({\bf W}) = \{w_{i,j}1(|w_{i,j}|\geq \delta)\} for any matrix {\bf W}=(w_{i,j}), with the threshold level \delta \geq 0 and 1(\cdot) representing the indicator function. We recommend to choose \delta=0 when p is fixed and \delta>0 when p \gg n.

For large p, since the sample covariance matrix may not be consistent, we recommend to use the method proposed in Cai, Liu and Luo (2011) to estimate the precision matrix \hat{{\bf V}}^{-1} (opt = 2).

control is a list of arguments passed to the function clime(), which contains the following components:

nlambda: Number of values for program generated lambda. The default is 100.
lambda.max: Maximum value of program generated lambda. The default is 0.8.
lambda.min: Minimum value of program generated lambda. The default is 10^{-4} (n>p) or 10^{-2} (n<p).
standardize: Logical. If standardize = TRUE, the variables will be standardized to have mean zero and unit standard deviation. The default is FALSE.
linsolver: An option used to choose which method should be employed. Available options include "primaldual" (the default) and "simplex". Rule of thumb: "primaldual" for large p, "simplex" for small p.

Value

An object of class "tspca", which contains the following components:

`B`	The `p\times p` transformation matrix `\hat{\bf B}=\hat{\bf \Gamma}_y'\hat{{\bf V}}^{-1/2}`, where `\hat{\bf \Gamma}_y` is a `p \times p` orthogonal matrix with the columns being the eigenvectors of `\hat{\bf W}_y`.
`X`	The `n \times p` matrix `\hat{\bf X}=(\hat{\bf x}_1,\dots,\hat{\bf x}_n)'` with `\hat{\bf x}_t = \hat{\bf B}{\bf y}_t`.
`NoGroups`	The number of groups.
`No_of_Members`	The number of members in each group.
`Groups`	The indices of the components of `\hat{\bf x}_t` that belong to each group.
`method`	A string indicating which permutation procedure is performed.

References

Cai, T., Liu, W., & Luo, X. (2011). A constrained L1 minimization approach for sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594–607. \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.1198/jasa.2011.tm10155")}.

Chang, J., Guo, B., & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. The Annals of Statistics, 46, 2094–2124. \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.1214/17-AOS1613")}.

Examples

# Example 1 (Example 1 in the supplementary material of Chang, Guo and Yao (2018)).
# p=6, x_t consists of 3 independent subseries with 3, 2 and 1 components.

## Generate x_t
p <- 6;n <- 1500
X <- mat.or.vec(p,n)
x <- arima.sim(model = list(ar = c(0.5, 0.3), ma = c(-0.9, 0.3, 1.2,1.3)),
n = n+2, sd = 1)
for(i in 1:3) X[i,] <- x[i:(n+i-1)]
x <- arima.sim(model = list(ar = c(0.8,-0.5),ma = c(1,0.8,1.8) ), n = n+1, sd = 1)
for(i in 4:5) X[i,] <- x[(i-3):(n+i-4)]
x <- arima.sim(model = list(ar = c(-0.7, -0.5), ma = c(-1, -0.8)), n = n, sd = 1)
X[6,] <- x

## Generate y_t
A <- matrix(runif(p*p, -3, 3), ncol = p)
Y <- A%*%X
Y <- t(Y)

## permutation = "max" or permutation = "fdr"
res <- PCA_TS(Y, lag.k = 5,permutation = "max")
res1 <- PCA_TS(Y, lag.k = 5,permutation = "fdr", beta = 10^(-10))
Z <- res$X


# Example 2 (Example 2 in the supplementary material of Chang, Guo and Yao (2018)).
# p=20, x_t consists of 5 independent subseries with 6, 5, 4, 3 and 2 components.

## Generate x_t
p <- 20;n <- 3000
X <- mat.or.vec(p,n)
x <- arima.sim(model = list(ar = c(0.5, 0.3), ma = c(-0.9, 0.3, 1.2, 1.3)), 
n.start = 500, n = n+5, sd = 1)
for(i in 1:6) X[i,] <- x[i:(n+i-1)]
x <- arima.sim(model = list(ar = c(-0.4, 0.5), ma = c(1, 0.8, 1.5, 1.8)),
n.start = 500, n = n+4, sd = 1)
for(i in 7:11) X[i,] <- x[(i-6):(n+i-7)]
x <- arima.sim(model = list(ar = c(0.85,-0.3), ma=c(1, 0.5, 1.2)),
n.start = 500, n = n+3,sd = 1)
for(i in 12:15) X[i,] <- x[(i-11):(n+i-12)]
x <- arima.sim(model = list(ar = c(0.8, -0.5),ma = c(1, 0.8, 1.8)),
n.start = 500, n = n+2,sd = 1)
for(i in 16:18) X[i,] <- x[(i-15):(n+i-16)]
x <- arima.sim(model = list(ar = c(-0.7, -0.5), ma = c(-1, -0.8)),
n.start = 500,n = n+1,sd = 1)
for(i in 19:20) X[i,] <- x[(i-18):(n+i-19)]

## Generate y_t
A <- matrix(runif(p*p, -3, 3), ncol =p)
Y <- A%*%X
Y <- t(Y)

## permutation = "max" or permutation = "fdr"
res <- PCA_TS(Y, lag.k = 5,permutation = "max")
res1 <- PCA_TS(Y, lag.k = 5,permutation = "fdr",beta = 10^(-200))
Z <- res$X

HDTSA documentation built on April 3, 2025, 11:07 p.m.