knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.align = "center", out.width = "600px") a4width<- 8.3 a4height<- 11.7

Nondegenerate covariance, correlation and spectral density matrices are necessarily sym-
metric or Hermitian and positive definite. In [@COvS17], we develop statistical data depths for collections of Hermitian positive definite matrices by exploiting the geometric structure of the space as a Riemannian manifold. Data depth is an important tool in statistical data analysis measuring the *depth* of a point with respect to a data cloud or probability distribution. In this way, data depth provides a center-to-outward ordering of multivariate data observations, generalizing the notion of a rank for univariate observations.

The proposed data depth measures can be used to characterize most central regions or detect outlying observations in samples of HPD matrices, such as collections of covariance or spectral density matrices. The depth functions also provide a practical framework to perform rank-based hypothesis testing for samples of HPD matrices by replacing the usual ranks by their depth-induced counterparts. Other applications of data depth include the construction of confidence regions, clustering, or classification for samples of HPD matrices.

In this vignette we demonstrate the use of the functions `pdDepth()`

and `pdRankTests()`

to compute data depth values of HPD matrix-valued observations and perform rank-based hypothesis testing for samples of HPD matrices, where the space of HPD matrices can be equipped with several different metrics, such as the affine-invariant Riemannian metric as discussed in [@COvS17] or Chapter 4 of [@C18].

`pdDepth()`

First, we generate a pointwise random sample of `(2,2)`

-dimensional HPD matrix-valued observations using the exponential map `Expm()`

, with underlying intrinsic (i.e., Karcher or Fréchet) mean equal to the identity matrix `diag(2)`

. Second, we generate a random sample of sequences (discretized curves) of `(2,2)`

-dimensional HPD matrix-valued observations, with underlying intrinsic mean curve equal to an array of rescaled identity matrices. For instance, we can think of the first sample as a random collection of HPD covariance matrices, and the second sample as a random collection of HPD spectral matrix curves in the frequency domain.

## Pointwise random sample library(pdSpecEst); set.seed(100) X1 <- replicate(50, Expm(diag(2), H.coeff(0.5 * rnorm(4), inverse = T))); str(X1) ## Curve random sample X2 <- replicate(50, sapply(1:5, function(i) Expm(i * diag(2), H.coeff(0.5 * rnorm(4), inverse = T) / i), simplify = "array")); str(X2)

The function `H.coeff()`

expands an Hermitian matrix with respect to an orthonormal basis of the space of Hermitian matrices equipped with the Frobenius (Euclidean) inner product $\langle .,. \rangle_F$. By specifying the argument `inverse = T`

, the function computes an inverse basis expansion, transforming a real-valued basis component vector back to an Hermitian matrix.

The function `pdDepth()`

computes the intrinsic data depth of a single HPD matrix (resp. curve of HPD matrices) `y`

with respect to a sample of HPD matrices (resp. sample of curves of HPD matrices) `X`

. The intrinsic data depth is calculated in the space of HPD matrices equipped with one of the following metrics: (i) affine-invariant Riemannian metric (`metric = "Riemannian"`

), (ii) log-Euclidean metric, the Euclidean inner product between matrix logarithms (`metric = "logEuclidean"`

), (iii) Cholesky metric, the Euclidean inner product between Cholesky decompositions (`metric = "Cholesky"`

), (iv) Euclidean metric (`metric = "Euclidean"`

) and (v) root-Euclidean metric, the Euclidean inner product between Hermitian square root matrices (`metric = "rootEuclidean"`

). Note that the default choice (affine-invariant Riemannian) has several useful properties not shared by the other metrics. See [@COvS17] and Chapter 4 of [@C18] for more details and additional properties of the available intrinsic depth functions.

**Remark**: an advantage of substituting, for instance, the Log-Euclidean metric is that the depth
computation times may be significantly faster, in particular for large sample sizes. However, the data depths with respect to the Log-Euclidean metric are not *general linear congruence invariant* according to Chapter 4 of [@C18], and should therefore be used with caution.

## Pointwise geodesic distance, zonoid and spatial depth point.depth <- function(method) pdDepth(y = diag(2), X = X1, method = method) point.depth("gdd"); point.depth("zonoid"); point.depth("spatial") ## Integrated geodesic distance, zonoid and spatial depth int.depth <- function(method){ pdDepth(y = sapply(1:5, function(i) i * diag(2), simplify = "array"), X = X2, method = method) } int.depth("gdd"); int.depth("zonoid"); int.depth("spatial")

By leaving the argument `y`

in the function `pdDepth()`

unspecified, the function computes
the data depth of each individual object in the sample array `X`

with respect to the sample `X`

itself.

(dd1 <- pdDepth(X = X1, method = "gdd")) ## pointwise geodesic distance depth (dd2 <- pdDepth(X = X2, method = "gdd")) ## integrated geodesic distance depth

A center-to-outward ordering of the individual objects in the sample of (curves of) HPD matrices is then obtained by computing the data depth induced ranks, with the most central observation (i.e., highest depth value) having smallest rank and the most outlying observation (i.e., lowest depth value) having largest rank.

(dd1.ranks <- rank(1 - dd1)) ## pointwise depth ranks (dd2.ranks <- rank(1 - dd2)) ## integrated depth ranks ## Explore sample X1 head(order(dd1.ranks)) ## most central observations rev(tail(order(dd1.ranks))) ## most outlying observations X1[ , , which(dd1.ranks == 1)] ## most central HPD matrix X1[ , , which(dd1.ranks == 50)] ## most outlying HPD matrix

We can compare the most central HPD matrix above with the intrinsic median of the observations under the affine-invariant metric obtained with `pdMedian()`

. Note that the intrinsic sample median maximizes the geodesic distance depth in a given sample of HPD matrices. The point of maximum depth (i.e., deepest point) with respect to the intrinsic zonoid depth is the intrinsic sample mean, which can be obtained with `pdMean()`

. For more details, see [@COvS17] or Chapter 4 of [@C18].

(med.X1 <- pdMedian(X1)) ## intrinsic sample median pdDepth(y = med.X1, X = X1, method = "gdd") ## maximum out-of-sample depth max(dd1) ## maximum in-sample depth

`pdRankTests()`

The null hypotheses of the available rank-based hypothesis tests in the function `pdRankTests()`

are specified by the argument `test`

and can be one of the following:

`"rank.sum"`

: homogeneity of distributions of two independent samples of HPD matrices (resp. sequences of HPD matrices).`"krusk.wall"`

: homogeneity of distributions of more than two independent samples of HPD matrices (resp. sequences of HPD matrices).`"signed-rank"`

: homogeneity of distributions of independent paired or matched samples of HPD matrices.`"bartels"`

: exchangeability (i.e. randomness) within a single independent sample of HPD matrices (resp. sequences of HPD matrices).

Below, we construct several simulated examples for which: (i) the null hypotheses listed above are satisfied; and (ii) the null hypotheses listed above are not satisfied. Analogous to the previous section, we generate pointwise random samples (resp. random samples of sequences) of `(2,2)`

-dimensional HPD matrix-valued observations, with underlying geometric mean equal to the identity matrix (resp. sequence of scaled identity matrices).

Let us first consider simulated examples of the intrinsic Wilcoxon rank-sum test (`test = "rank.sum"`

) and intrinsic Kruskal-Wallis test (`test = "krusk.wall"`

).

## Generate data (null true) data1 <- array(c(X1, replicate(50, Expm(diag(2), H.coeff(0.5 * rnorm(4), inverse = T)))), dim = c(2, 2, 100)) ## pointwise HPD sample data2 <- array(c(X2, replicate(50, sapply(1:5, function(i) Expm(i * diag(2), H.coeff(0.5 * rnorm(4), inverse = T) / i), simplify = "array"))), dim = c(2, 2, 5, 100)) ## HPD curve sample ## Generate data (null false) data1a <- array(c(X1, replicate(50, Expm(diag(2), H.coeff(rnorm(4), inverse = T)))), dim = c(2, 2, 100)) ## pointwise HPD scale change data2a <- array(c(X2, replicate(50, sapply(1:5, function(i) Expm(i * diag(2), H.coeff(rnorm(4), inverse = T) / i), simplify = "arra"))), dim = c(2, 2, 5, 100)) ## HPD curve scale change ## Rank-sum test pdRankTests(data1, sample_sizes = c(50, 50), "rank.sum")[1:4] ## null true (pointwise) pdRankTests(data2, sample_sizes = c(50, 50), "rank.sum")[2] ## null true (curve) pdRankTests(data1a, sample_sizes = c(50, 50), "rank.sum")[2] ## null false (pointwise) pdRankTests(data2a, sample_sizes = c(50, 50), "rank.sum")[2] ## null false (curve) ## Kruskal-Wallis test pdRankTests(data1, sample_sizes = c(50, 25, 25), "krusk.wall")[1:4] ## null true (pointwise) pdRankTests(data2, sample_sizes = c(50, 25, 25), "krusk.wall")[2] ## null true (curve) pdRankTests(data1a, sample_sizes = c(50, 25, 25), "krusk.wall")[2] ## null false (pointwise) pdRankTests(data2a, sample_sizes = c(50, 25, 25), "krusk.wall")[2] ## null false (curve)

To apply the manifold Wilcoxon signed-rank test (`test = "signed-rank"`

), we generate paired observations for independent trials (or subjects) by introducing trial-specific random effects, such that the paired observations in each trial share a trial-specific geometric mean. Note that for such data the intrinsic Wilcoxon rank-sum test is no longer valid due to the introduced sample dependence.

## Trial-specific means mu <- replicate(50, Expm(diag(2), H.coeff(0.1 * rnorm(4), inverse = T))) ## Generate paired samples X,Y make_sample <- function(null) sapply(1:50, function(i) Expm(mu[, , i], pdSpecEst:::T_coeff_inv(ifelse(null, 1, 0.5) * rexp(4) - 1, mu[, , i])), simplify = "array") X3 <- make_sample(null = T) ## refernce sample Y3 <- make_sample(null = T) ## null true Y3a <- make_sample(null = F) ## null false (scale change) ## Signed-rank test pdRankTests(array(c(X3, Y3), dim = c(2, 2, 100)), test = "signed.rank")[1:4] ## null true pdRankTests(array(c(X3, Y3a), dim = c(2, 2, 100)), test = "signed.rank")[2] ## null false

The intrinsic signed-rank test also provides a valid procedure to test for equivalence of spectral matrices of two (independent) multivariate stationary time series based on the HPD periodogram matrices obtained via `pdPgram()`

. In contrast to other available tests in the literature, this asymptotic test does not require consistent spectral estimators or resampling of test statistics, and therefore remains computationally efficient for higher-dimensional spectral matrices or a large number of sampled Fourier frequencies.

## Signed-rank test for equivalence of spectra ## vARMA(1,1) process: Example 11.4.1 in (Brockwell and Davis, 1991) Phi <- array(c(0.7, 0, 0, 0.6, rep(0, 4)), dim = c(2, 2, 2)) Theta <- array(c(0.5, -0.7, 0.6, 0.8, rep(0, 4)), dim = c(2, 2, 2)) Sigma <- matrix(c(1, 0.71, 0.71, 2), nrow = 2) pgram <- function(Sigma) pdPgram(rARMA(2^10, 2, Phi, Theta, Sigma)$X)$P ## HPD periodogram ## Null is true pdRankTests(array(c(pgram(Sigma), pgram(Sigma)), dim = c(2, 2, 2^10)), test = "signed.rank")[2] ## Null is false pdRankTests(array(c(pgram(Sigma), pgram(0.9 * Sigma)), dim = c(2, 2, 2^10)), test = "signed.rank")[2]

To conclude, we demonstrate the intrinsic Bartels-von Neumman test (`test = "bartels"`

) by generating an independent non-identically distributed sample of HPD matrices with a trend in the scale of the distribution across observations. In this case, the null hypothesis of randomness breaks down.

## Null is true data3 <- replicate(200, Expm(diag(2), H.coeff(rnorm(4), inverse = T))) ## iid HPd sample data4 <- replicate(100, sapply(1:5, function(i) Expm(i * diag(2), H.coeff(rnorm(4), inverse = T) / i), simplify = "array")) ## iid HPD curve ## Null is false data3a <- sapply(1:200, function(j) Expm(diag(2), H.coeff(((200 - j) / 200 + j * 2 / 200) * rnorm(4), inverse = T)), simplify = "array") ## pointwise trend in scale data4a <- sapply(1:100, function(j) sapply(1:5, function(i) Expm(i * diag(2), H.coeff(((100 - j) / 100 + j * 2 / 100) * rnorm(4), inverse = T) / i), simplify = "array"), simplify = "array") ## curve trend in scale ## Bartels-von Neumann test pdRankTests(data3, test = "bartels")[1:4] ## null true (pointwise) pdRankTests(data4, test = "bartels")[2] ## null true (curve) pdRankTests(data3a, test = "bartels")[2] ## null false (pointwise) pdRankTests(data4a, test = "bartels")[2] ## null false (curve)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.