# pdSpecClust2D: Intrinsic wavelet HPD time-varying spectral clustering In pdSpecEst: An Analysis Toolbox for Hermitian Positive Definite Matrices

## Description

pdSpecClust2D performs clustering of HPD time-varying spectral matrices corrupted by noise (e.g. HPD time-varying periodograms) by combining wavelet thresholding and fuzzy clustering in the intrinsic wavelet coefficient domain according to the following steps:

1. Transform a collection of noisy HPD time-varying spectral matrices to the intrinsic wavelet domain and denoise the HPD matrix surfaces by (tree-structured) thresholding of wavelet coefficients with pdSpecEst2D.

2. Apply an intrinsic fuzzy c-means algorithm to the coarsest midpoints at scale j = 0 across subjects.

3. Taking into account the fuzzy cluster assignments in the previous step, apply a weighted fuzzy c-means algorithm to the nonzero thresholded wavelet coefficients across subjects from scale j = 1 up to j = jmax.

More details can be found in Chapter 3 of \insertCiteC18pdSpecEst and the accompanying vignettes.

## Usage

 1 2 3 pdSpecClust2D(P, K, jmax, metric = "Riemannian", m = 2, d.jmax = 0.1, eps = c(1e-04, 1e-04), tau = 0.5, max_iter = 50, return.centers = FALSE, ...) 

## Arguments

 P a (d,d,n[1],n[2],S)-dimensional array of HPD matrices, corresponding to a collection of surfaces of (d,d)-dimensional HPD matrices of size n_1 \times n_2, with n_1 = 2^{J_1} and n_2 = 2^{J_2} for some J_1,J_2 > 0, for S different subjects. K the number of clusters, a positive integer larger than 1. jmax an upper bound on the maximum wavelet scale to be considered in the clustering procedure. If jmax is not specified, it is set equal to the maximum (i.e., finest) wavelet scale minus 2. metric the metric that the space of HPD matrices is equipped with. The default choice is "Riemannian", but this can also be one of: "logEuclidean", "Cholesky", "rootEuclidean" or "Euclidean". Additional details are given below. m the fuzziness parameter for both fuzzy c-means algorithms. m should be larger or equal to 1. If m = 1 the cluster assignments are no longer fuzzy, i.e., the procedure performs hard clustering. d.jmax a proportion that is used to determine the maximum wavelet scale to be considered in the clustering procedure. A larger value d.jmax leads to less wavelet coefficients being taken into account, and therefore lower computational effort in the procedure. If d.jmax is not specified, by default d.jmax = 0.1. eps an optional vector with two components determining the stopping criterion. The first step in the cluster procedure terminates if the (integrated) intrinsic distance between cluster centers is smaller than eps[1]. The second step in the cluster procedure terminates if the (integrated) Euclidean distance between cluster centers is smaller than eps[2]. By default eps = c(1e-04, 1e-04). tau an optional argument tuning the weight given to the cluster assignments obtained in the first step of the clustering algorithm. If tau is not specified, by default tau = 0.5. max_iter an optional argument tuning the maximum number of iterations in both the first and second step of the clustering algorithm, defaults to max_iter = 50. return.centers should the cluster centers transformed back the space of HPD matrices also be returned? Defaults to return.centers = FALSE. ... additional arguments passed on to pdSpecEst2D.

## Details

The input array P corresponds to a collection of initial noisy HPD time-varying spectral estimates of the (d,d)-dimensional time-varying spectral matrix at n_1 \times n_2 time-frequency points, with n_1, n_2 dyadic numbers, for S different subjects. These can be e.g., multitaper HPD time-varying periodograms given as output by the function pdPgram2D.
First, for each subject s = 1,…,S, thresholded wavelet coefficients in the intrinsic wavelet domain are calculated by pdSpecEst2D, see the function documentation for additional details on the wavelet thresholding procedure.
The maximum wavelet scale taken into account in the clustering procedure is determined by the arguments jmax and d.jmax. The maximum scale is set to the minimum of jmax and the wavelet scale j for which the proportion of nonzero thresholded wavelet coefficients (averaged across subjects) is smaller than d.jmax.
The S subjects are assigned to K different clusters in a probabilistic fashion according to a two-step procedure:

1. In the first step, an intrinsic fuzzy c-means algorithm, with fuzziness parameter m is applied to the S coarsest midpoints at scale j = 0 in the subject-specific 2D midpoint pyramids. Note that the distance function in the intrinsic c-means algorithm relies on the chosen metric on the space of HPD matrices.

2. In the second step, a weighted fuzzy c-means algorithm based on the Euclidean distance function, also with fuzziness parameter m, is applied to the nonzero thresholded wavelet coefficients of the S different subjects. The tuning parameter tau controls the weight given to the cluster assignments obtained in the first step of the clustering algorithm.

The function computes the forward and inverse intrinsic 2D AI wavelet transform in the space of HPD matrices equipped with one of the following metrics: (i) the affine-invariant Riemannian metric (default) as detailed in e.g., \insertCiteB09pdSpecEst[Chapter 6] or \insertCitePFA05pdSpecEst; (ii) the log-Euclidean metric, the Euclidean inner product between matrix logarithms; (iii) the Cholesky metric, the Euclidean inner product between Cholesky decompositions; (iv) the Euclidean metric; or (v) the root-Euclidean metric. The default choice of metric (affine-invariant Riemannian) satisfies several useful properties not shared by the other metrics, see \insertCiteC18pdSpecEst for more details. Note that this comes at the cost of increased computation time in comparison to one of the other metrics.
If return.centers = TRUE, the function also returns the K HPD time-varying spectral matrices corresponding to the cluster centers based on the given metric by applying the intrinsic inverse 2D AI wavelet transform ( InvWavTransf2D) to the cluster centers in the wavelet domain.

## Value

Depending on the input the function returns a list with five or six components:

cl.prob

an (S,K)-dimensional matrix, where the value at position (s,k) in the matrix corresponds to the probabilistic cluster membership assignment of subject s with respect to cluster k.

cl.centers.D

a list of K wavelet coefficient pyramids, where each 2D pyramid of wavelet coefficients is associated to a cluster center.

cl.centers.M0

a list of K arrays of coarse-scale midpoints at scale j = 0, where each array is associated to a cluster center.

cl.centers.f

only available if return.centers = TRUE, returning a list of K (d,d,n[1],n[2])-dimensional arrays, where each array corresponds to ann_1 \times n_2-sized surface of (d,d)-dimensional HPD matrices associated to a cluster center.

cl.jmax

the maximum wavelet scale taken into account in the clustering procedure determined by the input arguments jmax and d.jmax.

## References

\insertAllCited

pdSpecEst2D, WavTransf2D, pdDist, pdPgram2D
 1 2 3 4 5 6 7 8 ## Not run: ## Generate noisy HPD surfaces for 6 subjects in 2 groups n <- c(2^5, 2^5) P <- array(c(rExamples2D(n, example = "tvar", replicates = 3)$P, rExamples2D(n, example = "tvar", replicates = 3)$P), dim = c(2, 2, n, 6)) cl <- pdSpecClust2D(P, K = 2, metric = "logEuclidean") ## End(Not run)