knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "curv-meas_figures/"
)
library(symconivol)
library(conivol)
library(tidyverse)
library(knitr)
library(png)
library(Rmisc)
library(rstan)
library(latex2exp)
plot_Phi <- function( A, plot_Pat=TRUE, plot_leigh=FALSE, plot_Philim=FALSE ) {
    d <- dim(A)[1]-1
    n <- dim(A)[2]-1
    beta <- (d-n)/choose(n,2)
    pat <- pat_bnd(beta,n)
    n_grid <- 0:n
    d_grid <- 0:d
    tmp <- A %>% as_tibble() %>% add_column(k=0:d) %>% gather(...=-(n+2), factor_key=FALSE)
    pts <- add_column(tmp, r=as.numeric(substring(tmp[[2]],2))-1)[-2]
    P <- ggplot() +
        geom_point(data=pts, aes(x=r, y=k, color=value), alpha=0.5) +
        theme_bw() +
        xlab(TeX(sprintf("$n=%d,\\; \\beta = %d\\; (d=%d)$", n, beta, d))) +
        theme(legend.position="none",
              axis.title.y=element_blank(),
              axis.text.x=element_blank(), axis.text.y=element_blank(),
              axis.ticks.x=element_blank(), axis.ticks.y=element_blank(),
              panel.border = element_blank(), panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              axis.line = element_line(colour = "black")
        ) + scale_colour_gradient2()
    if (plot_Pat) {
        y_pat <- seq(0,1,length.out=1e2)
        pat_bd_low <- tibble(patX=d*y_pat^2, patY=n*y_pat)
        pat_bd_upp <- tibble(patX=d*(1-y_pat^2), patY=n*(1-y_pat))
        P <- P + geom_line(data=pat_bd_low, aes(patY, patX), linetype=2, alpha=0.5) +
             geom_line(data=pat_bd_upp, aes(patY, patX), linetype=2, alpha=0.5)
    }
    if (plot_leigh) {
        leighscurve <- leigh(1e3)$table
        leighscurve$c <- leighscurve$rho*n
        leighscurve$y <- leighscurve$kappa*d
        P <- P + geom_line(data=leighscurve,aes(x=c,y=y), linetype=2, alpha=0.5)
    }
    if (plot_Philim) {
        P <- P + geom_line(data=mu()$table,aes(x=rho*n,y=kappa*d), linetype=2, alpha=0.5)
    }
    return(P)
}

In this note we study the curvature measures of symmetric cones through the distribution of the Gaussian orthogonal/unitary/symplectic ensemble conditioned on the index function, that is, on the number of positive eigenvalues.

Our approach follows closely the approach of reconstructing the conic intrinsic volumes from the corresponding bivariate chi-bar-squared distribution. We will assume familiarity with this approach, which is explained in the vignette Estimating conic intrinsic volumes from bivariate chi-bar-squared data from the conivol package.

We present the connection between the curvature measures and the index constrained Gaussian orthogonal/unitary/symplectic ensemble without proof; for more information and references, see [@AB15s]. One motivation for this study is its connection with the problem of predicting the rank of the solution of a random semidefinite program. This connection is explained in the final section of this note.

In an appendix we present an observation about a possible connection between the curvature measures of semidefinite cones and the algebraic degree of semidefinite programming.

Some plots in this study require longer computations and are included as image files; the code for getting the data and for constructing these images is provided in this technical note.

Vignettes from the conivol package:

Symmetric cones {#symm_cones}

Symmetric cones are self-dual convex cones with a transitive group of symmetries. Every symmetric cone decomposes into an orthogonal sum of a finite number of simple symmetric cones, which consist of

The first four points each describe families of cones, while the last one is just a single exceptional cone. The Lorentz cones have a very simple structure (arguably the simplest kind of convex cones, which are not linear subspaces) and do not have the kind of structure that we focus on in this note. We also do not discuss the exceptional symmetric cone. So "symmetric cone" in this note is synonymous to a cone of positive semidefinite real symmetric/complex unitary/quaternion unitary matrices of a certain format.

We use the Dyson index $\beta\in{1,2,4}$ to indicate whether the ground field is real ($\beta=1$), complex ($\beta=2$), or quaternion ($\beta=4$); we denote the size of the matrices by $n$. The set of real symmetric/complex unitary/quaternion unitary matrices form a Euclidean space $\mathcal{E}$ of dimension [ d = n + \beta\binom{n}{2} = \frac{\beta}{2} n^2 \cdot \begin{cases} (1+\frac{1}{n}) & \text{if } \beta=1 \ 1 & \text{if } \beta=2 \ (1-\frac{1}{2n}) & \text{if } \beta=4 . \end{cases} ] The Gaussian orthogonal/symmetric/unitary ensemble, which we denote by $\text{G}\beta\text{E}$, is the standard normal distribution on $\mathcal{E}$ (after choosing some orthonormal basis in $\mathcal{E}$, the distribution can be found by taking iid standard normal random variables for the resulting coordinates). This distribution on $\mathcal{E}$ induces a distribution on the Weyl chamber [ {x\in\text{R}^n\mid x_1\leq x_2\leq\dots\leq x_n } , ] through the function that maps a matrix $A\in\mathcal{E}$ to its vector of ordered eigenvalues $\text{eig}(A)$. We slightly abuse notation and also denote this induced distribution on the Weyl chamber by $\text{G}\beta\text{E}$ (from the context it is clear whether we are talking about matrices or eigenvalues; we will mostly be talking about the eigenvalues). The Euclidean norm on $\mathcal{E}$ can be expressed through the eigenvalues; in fact, it is given by the Euclidean norm of the eigenvalues, $\|A\|_F=\|\text{eig}(A)\|=\|\text{eig}(A)\|_2$. (We use the notation $\|A\|_F$ to emphasize that we are not taking the usual operator norm but the Frobenius norm, which is the Euclidean norm in matrix space.)

We denote the cone of positive semidefinite matrices by $\mathcal{C}\subset\mathcal{E}$. The rank decomposition of $\mathcal{E}$ induces a rank decomposition of $\mathcal{C}$, which we denote by [ \mathcal{C} = \bigcup_{r=0}^n M_r . ] While the rank function is important, we will focus in this note on the index function that counts the number of positive entries in a vector, [ \text{ind}(x)=(\text{number of positive entries in }x) . ] Instead of the rank we can also use the index function to describe the strata of $\mathcal{C}$, [ M_r = {A\in \mathcal{C}\mid \text{ind}(\text{eig}(A))=r } . ]

Curvature measures of symmetric cones {#curv_meas}

The intrinsic volumes of a convex cone, and also the curvature measures that we will analyze here, can be described through corresponding (bivariate) chi-bar-squared distributions. For this we denote the orthogonal projection on the cone $\mathcal{C}$ by $\Pi_{\mathcal{C}}\colon\mathcal{E}\to \mathcal{C}$, [ \Pi_{\mathcal{C}}(A) = \text{argmin}{ \|A-B\|F \mid B\in \mathcal{C} } . ] It is easily verified that the eigenvalues of the projection on the positive semidefinite cone $\mathcal{C}$ and on its polar cone $\mathcal{C}^\circ=-\mathcal{C}$ are given by [ \text{eig}\big(\Pi{\mathcal{C}}(A)\big) = \big(\text{eig}(A)\big)+ ,\quad \text{eig}\big(\Pi{\mathcal{C}^\circ}(A)\big) = \big(\text{eig}(A)\big)- , ] where in $(x)+=:x_+$ all negative entries of $x$ are replaced by zero, and similarly in $(x)-=:x-$ all positive entries of $x$ are replaced by zero. To simplify the notation we will now focus on the eigenvalues of the matrices.

The bivariate chi-bar-squared distribution of $\mathcal{C}$ is the distribution of the pair [ \big(\|x_+\|^2,\|x_-\|^2\big) ,\quad x\sim \text{G}\beta\text{E} . ] The connection to the intrinsic volumes $v_k:=v_k(\mathcal{C})$, $0\leq k\leq d$, is given by an alternative way to describe this distribution: Let the discrete random variable $Z\in{0,\ldots,d}$ be described by the probabilities [ \text{Prob}(Z=k) = v_k , ] then $(X_+,X_-)$ defined by the conditional distributions [ X_+\mid Z\sim \chi_Z^2,\quad X_-\mid Z\sim \chi_{d-Z}^2 ,\quad X_+\mid Z,\; X_-\mid Z \text{ independent} , ] has the same distribution as $\big(\|x_+\|^2,\|x_-\|^2\big)$. This property, known as the generalized Steiner formula, provides a characterization of the intrinsic volumes.

The curvature measures $\Phi_{kr} := \Phi_k(\mathcal{C},M_r)$ allow for a similar characterization if the $\text{G}\beta\text{E}$ is conditioned on the index. More precisely, if $\emptyset\neq R\subseteq{0,\ldots,n}$, then the pair [ \big(\|x_+\|^2,\|x_-\|^2\big) ,\quad x\sim \big( \text{G}\beta\text{E}\mid\text{ind}(x)\in R \big) ] has the same distribution as $(X_+,X_-)$, hierarchically defined via [ X_+\mid Z\sim \chi_Z^2,\quad X_-\mid Z\sim \chi_{d-Z}^2 ,\quad X_+\mid Z,\; X_-\mid Z \text{ independent} , ] where the latent variable $Z$ has the probabilities [ \text{Prob}(Z=k) = \frac{\sum_{r\in R} \Phi_{kr}}{\sum_{\ell=0}^d \sum_{r\in R} \Phi_{\ell r}} . ] Again, this property, a consequence of a localized form of the generalized Steiner formula, characterizes the curvature measures $\Phi_{kr}$.

Note that for $R={0,\ldots,n}$ we obtain the intrinsic volumes, [ \sum_{r=0}^n \Phi_{kr}=v_k . ] On the other hand, summing over the "dimension" parameter $k$ yields the distribution of the index function, [ \sum_{k=0}^d \Phi_{kr} = \text{Prob}{\text{ind}(x)=r} ,\quad x\sim \text{G}\beta\text{E} . ] Yet another characterization of the curvature measures is obtained through the rank distribution of the solution of a (Gaussian) random semidefinite program; see the final section of this note for more details.

Pataki bounds {#pat_bnds}

Not all curvature measures $\Phi_{kr}$, $0\leq k\leq d$, $0\leq r\leq n$, are nonzero. In fact, the support of $\Phi_{kr}$ is accurately described by two inequalities known as Pataki's Inequalities: The curvature measure $\Phi_{kr}$ is nonzero iff the indices $k,r$ satisfy the inequalities [ r + \beta \binom{r}{2} \leq k \quad \text{and}\quad k \leq r + \beta \binom{r}{2} + \beta r (n-r) = d - \Big( n-r + \beta \binom{n-r}{2} \Big) . ] Concretely, we obtain in the three cases $\beta=1,2,4$, \begin{align} \beta=1: & & \tfrac{r^2}{2}(1+\tfrac{1}{r}) & \leq k \leq d - \tfrac{(n-r)^2}{2} (1+\tfrac{1}{n-r}) , \ \beta=2: & & r^2 & \leq k \leq d-(n-r)^2 , \ \beta=4: & & 2r^2 (1-\tfrac{1}{2r}) & \leq k \leq d - 2(n-r)^2 (1-\tfrac{1}{2(n-r)}) , \end{align} Rewriting these inequalities in terms of $\frac{k}{d}$ yields \begin{align} & \frac{r + \beta \binom{r}{2}}{d} \leq \frac{k}{d} \leq 1 - \frac{n-r + \beta \binom{n-r}{2}}{d} \ \iff & \left{ \begin{array}{r@{!}c@{!}ll} \frac{1+\frac{1}{r}}{1+\frac{1}{n}}\cdot\big(\frac{r}{n}\big)^2 & \leq \frac{k}{d} \leq & 1 - \big(1-\frac{r}{n}\big)^2\cdot \frac{1+\frac{1}{n-r}}{1+\frac{1}{n}} & \text{if } \beta=1 \ \big(\frac{r}{n}\big)^2 & \leq \frac{k}{d} \leq & 1-\big(1-\frac{r}{n}\big)^2 & \text{if } \beta=2 \ \frac{1-\frac{1}{2r}}{1-\frac{1}{2n}}\cdot\big(\frac{r}{n}\big)^2 & \leq \frac{k}{d} \leq & 1 - \big(1-\frac{r}{n}\big)^2\cdot\frac{1-\frac{1}{2(n-r)}}{1-\frac{1}{2n}} & \text{if } \beta=4 \end{array}\right. \end{align} Asymptotically, we obtain for $n\to\infty$ and $r_n,k_n$ such that $\frac{r_n}{n}\to\rho\in[0,1]$ and $\frac{k_n}{d}\to\kappa\in[0,1]$, the inequalities [ \rho^2 \leq \kappa\leq 1-(1-\rho)^2 ,\quad \text{for } \beta=1,2,4 . ] We illustrate these inequalities for $n=3,6,10$, $\beta=1,2,4$: (the asymptotic bounds $(\frac{r}{n})^2 \leq \frac{k}{d} \leq 1-(1-\frac{r}{n})^2$ are indicated by the dashed curves)

plotsPB <- list()
i <- 0
for (beta in c(1,2,4)) {
    for (n in c(3,6,10)) {
        i <- i+1
        pat <- pat_bnd(beta, n)
        d <- pat$d
        A <- matrix(0.2, d+1, n+1)
        for (r in 0:n) A[1+pat$k_low(r):pat$k_upp(r),1+r] <- 1
        plotsPB[[i]] <- plot_Phi(A)
    }
}
multiplot(plotlist = plotsPB, cols = 3)

The goal of this study is to analyze the distribution of the curvature measures within this Pataki range.

It is known that the index concentrates sharply around $n/2$, and it is also known that the intrinsic volumes concentrate sharply around $d/2$. So the unnormalized curvature measures concentrate in both parameters, $r$ and $k$, around $n/2$ and $d/2$, respectively. We will thus focus on the conditional distributions \begin{align} \Phi^{\text{ind}}r & := \frac{(\Phi{0r},\ldots,\Phi_{dr})}{\sum_{\ell=0}^d \Phi_{\ell r}} = \frac{(\Phi_{0r},\ldots,\Phi_{dr})}{\text{Prob}{\text{ind}(x)=r}} , \ \Phi^{\text{dim}}k & := \frac{(\Phi{k0},\ldots,\Phi_{kn})}{\sum_{s=0}^n \Phi_{ks}} = \frac{(\Phi_{k0},\ldots,\Phi_{kn})}{v_k}, \end{align} which we call the "index normalized" and the "dimension normalized" distributions. The dimension normalized distributions are particularly important for the application in semidefinite programming, as explained in the final section of this note.

Curvature measures normalized through index constraints {#ind_constr_cm}

As explained above, by restricting the index to be equal to $r$, that is, $R={r}$, we obtain the connection between the conditioned eigenvalue distribution $\text{G}\beta\text{E}\mid \text{ind}=r$ and the index normalized curvature measures $\Phi^{\text{ind}}{kr}$ via [ \big(\|x+\|^2,\|x_-\|^2\big) \stackrel{\text{dist}}{=} \sum_{k=0}^d 1_{Z=k}\big(X_k,Y_{d-k}\big) , ] where $x\sim \text{G}\beta\text{E}\mid \text{ind}=r$, $\text{Prob}(Z=k)=\Phi^{\text{ind}}_{kr}$, and $X_k,Y_k\sim\chi_k^2$, where $Z,X_k,Y_k$ independent.

We can reconstruct these numbers as weights from a bivariate chi-bar-squared distribution, as described in the vignette Estimating conic intrinsic volumes from bivariate chi-bar-squared data from the conivol package, assuming that we have samples from the conditioned eigenvalue distribution $\text{G}\beta\text{E}\mid \text{ind}=r$. Below we will discuss the details for these sampling and reconstruction procedures.

What remains to be addressed is the question how to obtain the other normalization, the dimension normalized curvature measures. This question is subtle, and if one tries to do this step in a "brute force" way, for example by reconstructing the unnormalized curvature measures first and then renormalizing these, then one will inevitably fail due to the mentioned double concentration behavior of the curvature measures.

Note that what we actually only need is the ratio of neighboring curvature measures, $\Phi^{\text{dim}}{k,r+1}/\Phi^{\text{dim}}{kr}$, because of the additional normalization $\sum_r \Phi^{\text{dim}}{kr}=1$. Now, we can write this in terms of the corresponding ratios of the index normalized curvature measures as follows: [ \frac{\Phi^{\text{dim}}{k,r+1}}{\Phi^{\text{dim}}{kr}} = \frac{\Phi{k,r+1}}{\Phi_{kr}} = \frac{\Phi^{\text{ind}}{k,r+1}}{\Phi^{\text{ind}}{kr}} \frac{\text{Prob}{\text{ind}(x)=r+1}}{\text{Prob}{\text{ind}(x)=r}} . ] All that we need to know is thus the ratio of the probabilities $\text{Prob}{\text{ind}(x)=r+1}$ and $\text{Prob}{\text{ind}(x)=r}$. We can find this ratio by sampling from the conditioned eigenvalue distribution $\text{G}\beta\text{E}\mid \text{ind}\in{r,r+1}$. More precisely, if we have $N^{(r)}$ samples from this distribution, of which $N^{(r)}_0$ have index $r$ and $N^{(r)}_1$ have index $r+1$, then the ratio $N^{(r)}_1/N^{(r)}_0$ is an unbiased estimate for the above probability ratio.

This gives us the following procedure for the renormalization step (we work with the logarithm to avoid lengthy products): for fixed $k$,

  1. choose $r_0$ such that $(k,r_0)$ is in the Pataki range, and set $\phi_{r_0}=0$,
  2. define $\phi_{r_0\pm s}$ ($s>0$ such that $(k,r_0\pm s)$ is in the Pataki range) by \begin{align} \phi_{r_0+s} & = \log\Phi^{\text{ind}}{k,r_0+s} - \log \Phi^{\text{ind}}{kr_0} + \sum_{i=0}^{s-1} \big( \log N^{(r_0+i)}1 - \log N^{(r_0+i)}_0 \big) , \ \phi{r_0-s} & = \log\Phi^{\text{ind}}{k,r_0-s} - \log \Phi^{\text{ind}}{kr_0} + \sum_{i=1}^s \big( \log N^{(r_0-i)}_0 - \log N^{(r_0-i)}_1 \big) , \end{align}
  3. obtain $\Phi^{\text{dim}}_k \approx \frac{\exp(\phi)}{\sum_r \exp(\phi_r)}$.

For higher dimensions we can approximate the renormalization step in the following way: In [@MNSV11] it was shown that [ \text{Prob}{\text{ind}(x)=r} \approx \exp(-\beta n^2 f(\tfrac{r}{n})) , ] for some rate function $f$. Using this, we can approximate the probability ratio as follows: \begin{align} \log\Big(\frac{\text{Prob}{\text{ind}(x)=r+1}}{\text{Prob}{\text{ind}(x)=r}}\Big) & \approx -\beta n^2 \big( f(\tfrac{r+1}{n})-f(\tfrac{r}{n})\big) \ & \approx -\beta n\, f'(\tfrac{r}{n}) . \end{align} Moreover, we can use this to approximate the values of $\phi_r$, [ \phi_r + \log\Phi^{\text{ind}}{kr_0} - \beta n^2 f(\tfrac{r_0}{n}) \approx \log\Phi^{\text{ind}}{kr} - \beta n^2 f(\tfrac{r}{n}) . ] We will use this approach below when looking ahead with artificial data.

Implementations {#implem}

Having explained the general approach, it remains to implement the computations, which consist of three parts:

  1. sampling from the index constrained eigenvalue distribution,
  2. reconstructing the index normalized curvature measures and the ratio of probabilities for consecutive indices from sampling data,
  3. calculating the dimension normalized curvature measures.

The difficulty of the first step lies again in the concentration effects. A random vector from the $\text{G}\beta\text{E}$ will have its index close to $n/2$ with high probability, making a simple rejection sampler (sample from $\text{G}\beta\text{E}$ then reject if the index is not in the required range) a practically infeasible approach. Instead we will use the Hamiltonian Monte-Carlo sampler Stan (wikipedia) for this task.

The second step is solved by adapting the expectation maximization (EM) approach for reconstructing the intrinsic volumes to the situation at hand. The required changes are minimal and and described below.

The third step is just an implementation of the computation as explained above.

Sampling index constrained eigenvalues {#sampl_constr_eigval}

The sampler Stan works solely through the log-likelihood function, which only has to be given up to additive constant (so the normalizing constant for the density does not have to be specified). Concretely, the density of $x\sim\text{G}\beta\text{E}$ is given by [ p(x) \propto e^{-\|x\|^2/2} \prod_{i<j} \big|x_i-x_j\big|^\beta . ] The corresponding log-likelihood is easily computed in Stan; the restriction on the index are realized by grouping the ordered eigenvalues into "positive", "free", "negative" and by requiring that the positive and negative ones do not change sign and the free eigenvalues lie between these two groups.

Concretely, if $n=40$ and the index shall lie between $25$ and $35$, then we assume that the eigenvalues $x_1\leq x_2 \leq \dots \leq x_{40}$ are grouped into $5$ eigenvalues with negative sign, $x_1,\ldots,x_5$, $10$ eigenvalues with no prescribed sign, $x_6,\ldots,x_{15}$, and the remaining $25$ eigenvalues with positive sign, $x_{16},\ldots,x_{40}$.

The function constr_eigval from the symconivol package generates the Stan model for the index constrained eigenvalue distribution. The models are slightly different for whether there are "positive", "free", or "negative", eigenvalues allowed. Passing this to the Stan sampler and then running it with the data (the Dyson index $\beta$ and the numbers of positive/free/negative eigenvalues) yields samples from the resulting index constrained eigenvalue distribution.

Example computations:

The following lines will construct a model for negative, free, and positive eigenvalues, then run it for $5$ negative, $10$ free, and $25$ positive eigenvalues, and then extract the sampled eigenvalues:

filename <- "tmp.stan"
M <- constr_eigval( beta=1, n=40, ind_low=25, ind_upp=35, filename=filename )
stan_samp <- stan( file = filename, data = M$data, chains = 1, warmup = 1e3,
                   iter = 1e5, cores = 2, refresh = 1e4 )
file.remove(filename)
samp <- list( ep = rstan::extract(stan_samp)$ep,
              ef = rstan::extract(stan_samp)$ef,
              en = rstan::extract(stan_samp)$en )

The resulting empirical eigenvalue distribution(s) look as follows:

img_dpi <- 300
include_graphics("nn_nf_np=5_10_25.png", dpi=img_dpi)

To illustrate the different situations we illustrate these empirical eigenvalue distributions for some more values for the parameters:

include_graphics("nn_nf_np=0_15_25.png", dpi=img_dpi)
include_graphics("nn_nf_np=16_0_24.png", dpi=img_dpi)
include_graphics("nn_nf_np=35_5_0.png", dpi=img_dpi)
include_graphics("nn_nf_np=20_0_0.png", dpi=img_dpi)
include_graphics("nn_nf_np=0_40_0.png", dpi=img_dpi)
include_graphics("nn_nf_np=0_0_40.png", dpi=img_dpi)

Reconstructing index normalized curvature measures {#reconstr_ind}

The weights of a bivariate chi-bar-squared distribution can be reconstructed as described in the vignette Estimating conic intrinsic volumes from bivariate chi-bar-squared data from the conivol package. We explain again the main idea behind this algorithm in the specific context of curvature measures.

In the following we will assume that the index constraints are of the form $r\leq \text{ind}(x)\leq r+s$, that is, $R={r,\ldots,r+s}$, and we assume $0< r< n$ and $0\leq s< n-r$. We assume strict inequalities so that we can assume that the positive and negative components of the eigenvalue vector are always nonzero (the cases of all positive or all negative eigenvalues are not interesting for the index/dimension normalized curvature measures). As explained above, we can turn a sample from the index constrained Gaussian orthogonal/unitary/symplectic ensemble into a sample from the bivariate chi-bar-squared distribution by taking the squared norms of the positive and negative components, [ (X,Y) = \big( \|x_+\|^2, \|x_-\|^2 \big) . ] The distribution of $(X,Y)$ can then be described in terms of the latent variable $Z$, [ \text{Prob}(Z=k) = \frac{\sum_{j=0}^s \Phi_{k,r+j}}{\sum_{\ell=1}^{d-1} \sum_{j=0}^s \Phi_{\ell,r+j}} , ] through the conditional distributions [ X\mid Z\sim \chi_Z^2,\quad Y\mid Z\sim \chi_{d-Z}^2 ,\quad X\mid Z,\; Y\mid Z \text{ independent} . ]

The Pataki bounds show that $\sum_{j=0}^s \Phi_{k,r+j}$ is nonzero iff [ r+\beta\binom{r}{2} \leq k \leq d - \Big( n-r-s + \beta \binom{n-r-s}{2} \Big) . ] In particular, since we assume that $0<r<n$ and $s<n-r$, the latent variable $Z$ will not take the values $0$ or $d$ with positive probability. So the latent variable is indeed entirely hidden, which is different from the intrinsic volumes case.

Assuming that we have $N$ samples $(\mathbf{X},\mathbf{Y})=\big((X_1,Y_1),\ldots,(X_N,Y_N)\big)$, the likelihood function, up to normalizing constant, is given by [ L(\Phi\mid \mathbf{X},\mathbf{Y}) \propto \prod_{i=1}^N \sum_{k=1}^{d-1} f_{ik}\sum_{j=0}^s\Phi_{k,r+j} ,\qquad f_{ik} = f_k(X_i) f_{d-k}(Y_i) , ] where $f_k(x)$ denotes the density of the chi-squared distribution, $f_k(x)\propto x^{k/2-1}e^{-x/2}$. Taking the latent variable into account, we obtain \begin{align} L(\Phi\mid \mathbf{X},\mathbf{Y},\mathbf{Z}) & \propto \prod_{i=1}^N \Big( f_{iZ_i}\sum_{j=0}^s\Phi_{Z_i,r+j} \Big) , \ \log L(\Phi\mid \mathbf{X},\mathbf{Y},\mathbf{Z}) & = \underbrace{\sum_{i=1}^N \log f_{iZ_i}}{\text{(indep. of $\Phi$)}} + \sum{i=1}^N \log\Big(\sum_{j=0}^s\Phi_{Z_i,r+j}\Big) + \text{const} . \end{align}

The EM algorithm tries to find the maximum likelihood estimate of the parameters of the latent variable by maximizing the conditional likelihood function, with the likelihood of the data expressed in the latent variable, which are conditioned on the current iterate. Concretely, we express the posterior density for the $i$th latent variable in the form \begin{align} p\big(Z_i=k\mid X_i=x_i,Y_i=y_i\big) & = \frac{p\big(X_i=x_i,Y_i=y_i\mid Z_i=k\big)\,p\big(Z_i=k\big)}{p(X_i=x_i,Y_i=y_i)} \ & = \frac{f_{ik} \sum_{j=0}^s\Phi_{k,r+j}\big/\sum_{\ell=1}^{d-1}\sum_{j=0}^s\Phi_{\ell,r+j}} {\sum_{\ell=1}^{d-1}f_{i\ell}\sum_{j=0}^s\Phi_{\ell,r+j} \big/\sum_{\ell=1}^{d-1}\sum_{j=0}^s\Phi_{\ell,r+j}} \ & = \frac{f_{ik} \sum_{j=0}^s\Phi_{k,r+j}} {\sum_{\ell=1}^{d-1}f_{i\ell}\sum_{j=0}^s\Phi_{\ell,r+j}} . \end{align} The EM algorithm then finds the $(t+1)$th iterate $\Phi^{(t+1)}$ from the $t$th iterate $\Phi^{(t)}$ by maximizing the following function in $\Phi$: \begin{align} \underset{\mathbf{Z}\mid \mathbf{X},\mathbf{Y},\Phi^{(t)}}{\text{E}} \big[\log L\big(\Phi \,\big|\, \mathbf{X},\mathbf{Y}, \mathbf{Z} \big)\big] & = \sum_{i=1}^N \sum_{k=1}^{d-1} \frac{f_{ik} \sum_{j=0}^s\Phi_{k,r+j}^{(t)}} {\sum_{\ell=1}^{d-1}f_{i\ell}\sum_{j=0}^s\Phi_{\ell,r+j}^{(t)}} \log\Big(\sum_{j=0}^s\Phi_{k,r+j}\Big) + \text{const} . \end{align} This function, which we divide by the number of samples to achieve a convergence of the constants, [ F(a_1,\ldots,a_{d-1}) = \sum_{k=1}^{d-1} \bigg( \frac{1}{N} \sum_{i=1}^N \frac{f_{ik} \sum_{j=0}^s\Phi_{k,r+j}^{(t)}} {\sum_{\ell=1}^{d-1}f_{i\ell}\sum_{j=0}^s\Phi_{\ell,r+j}^{(t)}}\bigg) \log a_k , ] can be maximized over the probability simplex as a separable convex program. In the symconivol package this is achieved via MOSEK.

So this will be one step of the EM algorithm.

Log-concavity: In the case of intrinsic volumes it has turned out that assuming log-concavity is essential for obtaining a stable convergence of the EM algorithm. Recall that a sequence $a_0,a_1,\ldots,a_d$ of positive numbers is log-concave if [ \log a_k \geq \frac{\log a_{k-1} + \log a_{k+1}}{2} . ] These inequalities can be easily enforced in the EM algorithm by projecting the vector of the logarithm onto the convex cone formed by these sequences.

In the case of intrinsic volumes the validity of these inequalities is a well-established conjecture. In the case of the curvature measures this is less well tested. Nevertheless, we will enforce log-concavity when reconstructing the curvature measures as this will stabilize the EM algorithm, and because it seems to be a reasonable assumption that improves the whole procedure work.

Example computations:

We evaluate the index constrained eigenvalue data from above with [ \beta=1 ,\; n=40 ,\; r=25 ,\; s=10 . ] Let's have a look at the Pataki bounds for these values:

beta <- 1
n <- 40
r <- 25
s <- 10
pat <- pat_bnd(beta,n)
print(str_c("dimension: d=", pat$d))
print(str_c("Pataki bounds: ",pat$k_low(r),", ",pat$k_upp(r+s)))

The following lines transform the eigenvalue data first into bivariate chi-bar-squared data, then evaluate the densities for the EM algorithm, and then run the EM algorithm for $100$ iterations. The list samp was obtained in the previous section.

# convert sample data to bivariate chi-bar-squared data:
m_samp <- constr_eigval_to_bcbsq(samp)

# prepare data for EM algorithm:
data <- prepare_em_cm(d=pat$d, low=pat$k_low(r), upp=pat$k_upp(r+s), m_samp=m_samp)

# run EM algorithm for 100 steps:
em <- estim_em_cm(d=pat$d, low=pat$k_low(r), upp=pat$k_upp(r+s), m_samp=m_samp, N=100,
                  data=data, no_of_lcc_projections = 1)

# display some iterates of the EM algorithm
tib_plot <- as_tibble( t(em[1+10*(0:10), ]) ) %>%
    add_column(k=pat$k_low(r):pat$k_upp(r+s),.before=1) %>% gather(step,value,2:12)
G <- ggplot(tib_plot,aes(x=k,y=value,color=step)) +
    geom_line() + theme_bw() +
    theme(legend.position="none", axis.title.x=element_blank(), axis.title.y=element_blank())

The resulting plot looks as follows:

# # image obtained through the following command:
# ggsave("ind_norm_from_constr_eigval.png", plot=G, device="png",
#        width=178, height=80, units="mm", dpi=300)
include_graphics("ind_norm_from_constr_eigval.png", dpi=img_dpi)

We can use this method to shed some light on the Pataki range, as displayed above for $n=3,6,10$. For the index normalized curvature measures we obtain the following picture:

plotsPB <- list()
plotPhiInd <- list()
j <- 0
for (beta in c(1,2,4)) {
    for (n in c(3,6,10)) {
        j <- j+1
        pat <- pat_bnd(beta,n)
        d <- pat$d
        A <- matrix(0,d+1,n+1)
        PhiInd <- symconivol::phi_ind[[str_c("beta=",beta,",n=",n)]]
        for (r in 0:n) {
            for (k in 0:d) {
                if (r<ceiling(n/2)) {
                    r_lkup <- n-r
                    k_lkup <- d-k
                } else {
                    r_lkup <- r
                    k_lkup <- k
                }
                if (r_lkup==n & k_lkup==d)
                    A[1+k,1+r] <- 1
                else if (k_lkup >= pat$k_low(r_lkup) & k_lkup <= pat$k_upp(r_lkup)) {
                    colPhi <- str_c("r=",r_lkup,",s=0")
                    A[1+k,1+r] <- PhiInd[[colPhi]][k_lkup-pat$k_low(r_lkup)+1]/max(PhiInd[[colPhi]])
                }
            }
        }
        plotPhiInd[[j]] <- A
        plotsPB[[j]] <- plot_Phi(A, TRUE, TRUE, FALSE)
    }
}
multiplot(plotlist = plotsPB, cols = 3)

The dashed curves indicate the asymptotic Patali range and Leigh's curve, which we describe below in more details. Notice the apparent convergence of the index constrained curvature measures to this curve.

Reconstructing dimension normalized curvature measures {#reconstr_dim}

For the reconstruction of the dimension normalized curvature measures we proceed as described above.

Example computations:

We reconstruct the same range again as above, $n=3,6,10$:

plotsPB <- list()
plotPhiDim <- list()
j <- 0
for (beta in c(1,2,4)) {
    for (n in c(3,6,10)) {
        j <- j+1
        pat <- pat_bnd(beta,n)
        d <- pat$d
        IndP <- matrix(0,2,n-2)
        for (r in 1:(n-2)) {
            if (r==(n-1)/2) {
                IndP[,r] <- c(1,1)
            } else if (r<(n-1)/2) {
                index <- str_c("beta=",beta,",n=",n,",r=",n-r-1)
                IndP[,r] <- symconivol::ind_prob[[index]][c(2,1)]
            } else {
                index <- str_c("beta=",beta,",n=",n,",r=",r)
                IndP[,r] <- symconivol::ind_prob[[index]]
            }
        }
        B <- plotPhiInd[[j]]
        B <- t(t(B)/colSums(B))
        A <- matrix(0,d+1,n+1)
        A[1,1] <- 1
        A[d+1,n+1] <- 1
        for (k in 1:(d-1)) {
            rmin <- pat$r_low(k)
            rmax <- pat$r_upp(k)
            logR <- rep(0,rmax-rmin+1)
            r0 <- floor((rmin+rmax)/2)
            if (r0<rmax) {
                for (r in (r0+1):rmax) {
                    logR[1+r-rmin] <- log(B[1+k,1+r]) - log(B[1+k,1+r0]) +
                        sum( log(IndP[2,r0:(r-1)]) - log(IndP[1,r0:(r-1)]) )
                }
            }
            if (r0>rmin) {
                for (r in (r0-1):rmin) {
                    logR[1+r-rmin] <- log(B[1+k,1+r]) - log(B[1+k,1+r0]) +
                        sum( log(IndP[1,r:(r0-1)]) - log(IndP[2,r:(r0-1)]) )
                }
            }
            A[1+k,1+rmin:rmax] <- exp(logR)/max(exp(logR))
        }
        plotPhiDim[[j]] <- A
        plotsPB[[j]] <- plot_Phi(A, TRUE, FALSE, TRUE)
    }
}
multiplot(plotlist = plotsPB, cols = 3)

We observe again a concentration behavior, but with a limit curve which is apparently different from the limit curve of the index constrained curvature measures. The indicated curve is a guess for this limit curve, based on Leigh's curve and the rate function of the index probabilities. See below for details on how we derived this curve (it should be noted that this curve is based on a rather broad assumption, and an "eye-balled" value for an essential parameter defining the curve). This curve is interesting because it can be used to predict the rank of the solution of semidefinite programs (see below).

Looking ahead {#look_ahead}

In this sectioin we try to get a better insight in the higher dimensional behavior of the dimension normalized curvature measures. We will do this by using known and conjectured assumptions about the general behavior of the curvature measures. More precisely, while it is known (see [@GNP17]) that the intrinsic volumes of the semidefinite cones follow a central limit theorem, that is, they approximate the normal distribution in high dimensions, this is so far not known to hold for the index normalized curvature measures. It is nevertheless a reasonable assumption to make that they, too, obey a central limit law. We will make this assumption in this section to get an idea about what happens in higher dimensions for the dimension normalized curvature measures.

The rate function of the index probabilities {#rate_fct}

In [@MNSV11] it was shown that [ \log\text{Prob}{\text{ind}(x)=r} \approx -\beta n^2 R(\tfrac{r}{n}) + \text{const}(d) , ] for some rate function $R$, for which they give an explicit formula (see below for details on the calculation of $R$). The graphs of the rate function $R$ and its derivative look as follows:

R <- rate()
pL <- list(
    ggplot(R$table,  aes(x=rho,y=R))  + geom_line() + theme_bw() +
        theme(panel.border = element_blank(), panel.grid.major = element_blank(),
              axis.line = element_line(colour = "black")) +
        xlab(TeX(sprintf("$\\rho$"))) + ylab(TeX(sprintf("$R(\\rho)$"))) ,
    ggplot(R$dtable, aes(x=rho,y=dR)) + geom_line() + theme_bw() +
        theme(panel.border = element_blank(), panel.grid.major = element_blank(),
              axis.line = element_line(colour = "black")) +
        xlab(TeX(sprintf("$\\rho$"))) + ylab(TeX(sprintf("$R'(\\rho)$")))
)
multiplot(plotlist=pL, cols=2)

Using this, we can approximate the probability ratio as follows: \begin{align} \log\Big(\frac{\text{Prob}{\text{ind}(x)=r+1}}{\text{Prob}{\text{ind}(x)=r}}\Big) & \approx -2d \big( R(\tfrac{r+1}{n})-R(\tfrac{r}{n})\big) \approx \frac{-2d}{n}\, R'(\tfrac{r}{n}) . \end{align} (We use $2d$ instead of $\beta n^2$, as this visibly improves the approximations below.) The following plots compare the empirically estimated log of the ratio of probabilities for neighboring indices with the approximate asymptotic expressions (the estimate through the derivative of the rate function is depicted as the red dashed curve):

plotIndP <- list()
IndP <- list()
j <- 0
R <- rate()
for (beta in c(1,2,4)) {
    for (n in c(6,10)) {
        j <- j+1
        pat <- pat_bnd(beta,n)
        d <- pat$d
        P <- matrix(0,2,n-2)
        for (r in 1:(n-2)) {
            if (r==(n-1)/2) {
                P[,r] <- c(1,1)
            } else if (r<(n-1)/2) {
                index <- str_c("beta=",beta,",n=",n,",r=",n-r-1)
                P[,r] <- symconivol::ind_prob[[index]][c(2,1)]
            } else {
                index <- str_c("beta=",beta,",n=",n,",r=",r)
                P[,r] <- symconivol::ind_prob[[index]]
            }
        }
        IndP[[j]] <- P
        tib <- tibble(r=0.5+1:(n-2))
        tib$logR_est <- log(P[2,])-log(P[1,])
        tib$logR_app <- -2*d*(R$lkup_R((2:(n-1))/n)-R$lkup_R((1:(n-2))/n))
        tib$logR_app2 <- -2*d/n*R$lkup_dR(tib$r/n)
        # tib$logR_app <- -beta*n^2*(R$lkup_R((2:(n-1))/n)-R$lkup_R((1:(n-2))/n))
        # tib$logR_app2 <- -beta*n*R$lkup_dR(tib$r/n)
        plotIndP[[j]] <- ggplot(tib) +
            geom_point(aes(x=r,y=logR_est)) +
            geom_line(aes(x=r,y=logR_app)) +
            geom_line(aes(x=r,y=logR_app2), color="red", linetype=2) +
            theme_bw() +
            xlab(TeX(sprintf("$n=%d,\\; \\beta = %d$", n, beta))) +
            theme(legend.position="none",
                  axis.title.y=element_blank(),
                  panel.border = element_blank(), panel.grid.major = element_blank(),
                  panel.grid.minor = element_blank(),
                  axis.line = element_line(colour = "black")
            )
    }
}
multiplot(plotlist=plotIndP, cols=3)

The variance of the intrinsic volumes {#var_ivol}

Before considering the curvature measures, we mention that the variance of the intrinsic volumes $v=(\sum_r \Phi_{kr})$ have a variance of approximating $d\big(\frac{8}{\pi^2}-\frac{1}{2}\big)$. More precisely, this was shown in [@McCT14] for $\beta=1$, but as the following estimates using the bivariate chi-bar-squared distribution suggest, the formula should also hold for $\beta=2,4$.

include_graphics("var_ivol.png", dpi=img_dpi)

The variance of index normalized curvature measures

So far, no asymptotic estimates for the variance of the index normalized curvature measures are known. We obtain the following plots for the estimated values:

nmin <- 3
nmax <- 10
plotsPB <- list()
j <- 0
C <- 0.2
for (beta in c(1,2,4)) {
    j <- j+1
    var_plot <- list()
    for (n in nmin:nmax) {
        pat <- pat_bnd(beta,n)
        ind1 <- str_c("beta=",beta,",n=",n)
        i <- n-nmin+1
        var_plot[[i]] <- rep(0,n+1)
        for (r in ceiling(n/2):(n-1)) {
            k_low <- pat$k_low(r)
            k_upp <- pat$k_upp(r)
            ind2 <- str_c("r=",r,",s=0")
            Phi <- symconivol::phi_ind[[ind1]][[ind2]]
            var_new <- sum((k_low:k_upp)^2*Phi)-sum((k_low:k_upp)*Phi)^2
            var_plot[[i]][1+r] <- var_new
            var_plot[[i]][1+n-r] <- var_new
        }
        names(var_plot)[i] <- str_c("n=",n)
    }

    # tidying data
    tib_plot <- tibble()
    for (n in nmin:nmax) {
        tib_new <- tibble( r=0:n, var=var_plot[[str_c("n=",n)]], n=n, beta=beta, d=pat_bnd(beta,n)$d )
        tib_plot <- bind_rows(tib_plot,tib_new)
    }
    x <- seq(0,1,length.out=1e3)

    plotsPB[[j]] <- ggplot(tib_plot, aes(x=r/n, y=var/d, group=n, color=as.factor(n) )) +
        geom_line() + theme_bw() + scale_color_discrete(name="n") +
        theme(legend.position="bottom", axis.title.y=element_blank(),
              panel.border = element_blank(), panel.grid.major = element_blank(),
              axis.line = element_line(colour = "black")) +
        xlab(TeX(sprintf("$\\beta = %d$", beta))) +
        geom_line(data=tibble(x=x,y=(1-2*abs(x-0.5))*C),aes(x=x,y=y),
                  color="black", linetype=1, size=1.5, alpha=0.2)
}
multiplot(plotlist = plotsPB, cols = 3)

The above plots show the normalized empirical variance of $\Phi^{\text{ind}}{kr}$, that is, the curves are given by [ \bigg( \frac{r}{n} ,\, \frac{\sum_k k^2 \Phi^{\text{ind}}{kr} - \big(\sum_k k \Phi^{\text{ind}}_{kr}\big)^2}{d} \bigg) \qquad \Big[ \text{in gray:}\quad \big(\rho,\,0.2\big(1-\big|2\rho-1\big|\big)\big) \Big] . ] The constant $0.2$ is eye-balled from the plots, and probably incorrect. But it seems justified to assume that the variance of the index constrained curvature measures can be approximated in the form [ C\,d\,\big(1-\big|\tfrac{2r}{n}-1\big|\big) , ] for some constant $C$ ($\approx 0.2$).

Reconstructing dimension normalized curvature measures from artificial data {#rec_art}

Combining the assumption of a central limit law for the index normalized curvature measures with the above guess for their variance, we obtain a guesstimate of [ \log \Phi^{\text{ind}}_{kr} \approx - d \frac{\big(\frac{k}{d}-\ell(\frac{r}{n})\big)^2}{2C\big(1-\big|\tfrac{2r}{n}-1\big|\big)} , ] where $\ell(\rho)$ denotes Leigh's curve (see below for details), and where we use the estimate $C=0.2$. We can renormalize these using the rate function of the index probabilities, [ \log\text{Prob}{\text{ind}(x)=r} \approx -2d\, R(\tfrac{r}{n}) + \text{const}(d) , ] which results in the following pictures for the estimated dimension normalized curvature measures:

plotsPB <- list()
plotPhiDim <- list()
j <- 0
L <- leigh()
R <- rate()
C <- 0.2
for (beta in c(1,2,4)) {
    for (n in c(6,10,20)) {
        j <- j+1
        pat <- pat_bnd(beta,n)
        d <- pat$d
        A <- matrix(0,d+1,n+1)
        A[1,1] <- 1
        A[d+1,n+1] <- 1
        for (k in 1:(d-1)) {
            rmin <- pat$r_low(k)
            rmax <- pat$r_upp(k)
            logP <- -d*(k/d-L$lkup_kappa((rmin:rmax)/n))^2/2/C/(1-abs(2*(rmin:rmax)/n-1)) - 
                        2*d*R$lkup_R((rmin:rmax)/n)
            A[1+k,1+rmin:rmax] <- exp(logP)/max(exp(logP))
        }
        plotPhiDim[[j]] <- A
        plotsPB[[j]] <- plot_Phi(A, TRUE, FALSE, TRUE)
    }
}
multiplot(plotlist = plotsPB, cols = 3)

Computing the limit curve (with lots of assumptions) {#Phidimlim}

We can compute the limit curve for the dimension normalized curvature measures in terms of the estimated constant $C$, and of course relying on the two main assumptions that

  1. the index normalized curvature measures obey a central limit law,
  2. the variance of the index normalized curvature measures has the form $C\big(1-\big|\frac{2r}{n}-1\big|\big)$.

Combining this assumptions and estimates yields \begin{align} \log\Phi_{kr} & = \log\Phi^{\text{ind}}_{kr} + \log\text{Prob}{\text{ind}(x)=r} \ & \approx - d \frac{\big(\frac{k}{d}-\ell(\frac{r}{n})\big)^2} {2C\big(1-\big|\tfrac{2r}{n}-1\big|\big)} -2d\, R(\tfrac{r}{n}) + \text{const}(d) . \end{align} The limit curve of the dimension normalized curvature measures (found through their modes) is given by the maximum of this function in $r$. Substituting $\rho:=\frac{r}{n}$ and $\kappa:=\frac{k}{d}$, we see that the curve is described through the minimum (in $\rho$) of [ \frac{(\kappa-\ell(\rho))^2}{1-|2\rho-1|} + 4C\, R(\rho) ,\quad 1-\sqrt(1-\kappa) \leq \rho \leq \sqrt(\kappa) . ] We can find these minima via a simple table lookup procedure. The resulting curves (for different values of $C$) look as follows:

include_graphics("lim_PhiDim.png", dpi=img_dpi)

Deriving the asymptotics for index constrained curvature measures {#asym_indnorm}

In this section we provide more details about the rate function and Leigh's curve. These are given and derived from [@MNSV11], whose formulas we reproduce and illustrate here.

Semi-circle {#semicirc}

It is well known that the empirical distribution function of the eigenvalues, [ \frac{1}{n} \sum_{j=1}^n\delta_{x_j}(t) ,\quad (x_1,\ldots,x_n) \sim \text{G}\beta\text{E} , ] approximates a semi-circle distribution in high dimensions. More precisely, in order to obtain this limit one has to rescale the eigenvalues by $1/\sqrt{n}$, and in order to obtain the limit distribution independent of the Dyson index, one additionally rescales by by $1/\sqrt{\beta}$: [ g(t) = \frac{1}{n} \sum_{j=1}^n\delta_{y_j}(t) ,\quad y=\frac{x}{\sqrt{\beta n}} ,\quad x \sim \text{G}\beta\text{E} , ] converges almost surely to the semicircle distribution with support $[-\sqrt{2},\sqrt{2}]$. Before modifying this model to the case of a fixed index ratio, let us have a look at some example cases to illustrate this wonderful classical result.

Example computations:

We find the empirical distribution through iterated sampling (sample size $10^5$). For $n=10$ we obtain the following picture:

include_graphics("semic_n=10.png", dpi=img_dpi)

For $n=40$ the convergence becomes evident:

include_graphics("semic_n=40.png", dpi=img_dpi)

Limit distribution for fixed index ratio {#limdist_indfct}

The case of a fixed index ratio was considered in [@MNSV11]; they considered the empirical distribution function [ g_r(t) = \frac{1}{n} \sum_{j=1}^n\delta_{y_j}(t) ,\quad y=\frac{x}{\sqrt{\beta n}} ,\quad x \sim \text{G}\beta\text{E}\mid\text{ind}=r , ] and found the limit distribution for the case $\frac{r}{n}\to \rho$. (The semicircle is found in the case $\rho=1/2$.)

Denoting the limit distribution by $g_\rho^$, by symmetry, we have $g_\rho^(t)=g_{1-\rho}^(-t)$, and for $\frac{1}{2}\leq \rho\leq 1$ the distribution $g_\rho^$ can be found as follows:

rho_a <- function(a) {
    integrand <- function(x) return(sqrt((1-x)/x*(x^2+x+(a-1)/a^2)))
    return(2/pi/(1-(a-1)/a^2)*integrate(integrand, lower=0, upper=1)$value)
}
L_a <- function(a) return(a*sqrt(2/(a^2-a+1)))
a <- seq(1,2,length.out=1e3)
rho <- sapply(a, rho_a)
L2 <- sapply(a, L_a)
params <- tibble(a=a, rho=rho, L2=L2)
plotsParams <- list()
plotsParams[[1]] <- ggplot(params, aes(x=rho,y=a-1)) +
    geom_line() + theme_bw() +
    xlab(TeX("$\\rho$")) + ylab(TeX("$b$"))
plotsParams[[2]] <- ggplot(params, aes(x=rho,y=L2/sqrt(2))) + ylab("L/sqrt(2)") +
    geom_line() + theme_bw() +
    xlab(TeX("$\\rho$")) + ylab(TeX("$L/\\sqrt{2}$"))
multiplot(plotlist = plotsParams, cols = 2)
bnd_a <- function(a) {
    L <- L_a(a)
    return(list(neg_low=-L/a, neg_upp=-L*(1-1/a), pos_low=0, pos_upp=L))
}
rho_a <- function(a,moment=0) {
    L <- L_a(a)
    return( function(x) x^moment/pi*sqrt( (L-x)/x * (x+L/a) * (x+(1-1/a)*L) ) )
}
rho0_L <- c(0.6,0.7,0.8,0.9,1)
hues = seq(15, 375, length = length(rho0_L) + 1)
mycolors <- hcl(h = hues, l = 65, c = 100)[length(rho0_L):1]
data_neg <- tibble()
data_pos <- tibble()
for (rho0 in rho0_L) {
    ind <- which.min((rho-rho0)^2)
    a0 <- a[ind]
    bnd <- bnd_a(a0)
    x_neg <- seq(bnd$neg_low,bnd$neg_upp,length.out=1e3)
    x_pos <- seq(bnd$pos_low,bnd$pos_upp,length.out=1e3)
    y_neg <- sapply(x_neg, rho_a(a0))
    y_pos <- sapply(x_pos, rho_a(a0))
    data_neg <- bind_rows(data_neg, tibble(x=x_neg, y=y_neg, rho=as.factor(rho0)))
    data_pos <- bind_rows(data_pos, tibble(x=x_pos, y=y_pos, rho=as.factor(rho0)))
}
ggplot() + coord_cartesian(ylim = c(0, 1.5)) + theme_bw() +
    geom_line(data=data_neg, aes(x=x,y=y,group=rho,color=rho)) +
    geom_line(data=data_pos, aes(x=x,y=y,group=rho,color=rho)) +
    scale_colour_manual(values=mycolors) +
    xlab(TeX("$s=Lt$")) + ylab(TeX("$g_{\\rho}^*(s)$")) +
    guides(color=guide_legend(title=TeX("$\\rho$")))

The integral expression of $\rho\in[\frac{1}{2},1]$ in terms of $b\in[0,1]$ can be solved in terms of special functions: \begin{align} \rho & = \frac{2b}{\pi\sqrt{1+2b}} \bigg( \Pi\Big(\frac{1+b}{1+2b},\sqrt{\frac{1-b^2}{1+2b}}\Big) + \frac{(1+2b)(1-b)}{2(1+b+b^2)} K\Big(\sqrt{\frac{1-b^2}{1+2b}}\Big) \bigg) , \end{align} where $K$ and $\Pi$ denote the Legendre form of the complete elliptic integral of first and third kind, \begin{align} K(k) & = \int_0^1 \frac{dt}{\sqrt{1-t^2}\sqrt{1-k^2t^2}} , \ \Pi(\alpha^2,k) & = \int_0^1 \frac{dt}{\sqrt{1-t^2}\sqrt{1-k^2t^2}(1-\alpha^2t^2)} . \end{align} For later use we also mention the complete elliptic integral of second kind, \begin{align} E(k) & = \int_0^1 \sqrt{\frac{1-k^2t^2}{1-t^2}}\,dt . \end{align}

Example computations:

We illustrate the asymptotics for the fixed ratio case as in the case of the semicircle. For $\rho=0.6$ we obtain for $n=10$:

include_graphics("indlim_n_cPerc=10_60.png", dpi=img_dpi)

For $n=40$ the convergence becomes evident:

include_graphics("indlim_n_cPerc=40_60.png", dpi=img_dpi)

Similarly, for $\rho=0.8$ we obtain for $n=10$:

include_graphics("indlim_n_cPerc=10_80.png", dpi=img_dpi)

For $n=40$ we obtain the following picture:

include_graphics("indlim_n_cPerc=40_80.png", dpi=img_dpi)

Calculating the rate function {#ratefct}

In [@MNSV11] (equation (92)) the following formula for the rate function $R(\rho)$ (for $\rho>\frac{1}{2}$; $R(\rho)=R(1-\rho)$) has been given: \begin{align} & R(\rho) = \frac{L^2-1-\log(2L^2)}{4} + \frac{\rho}{2} \int_1^{\infty} tL^2-\frac{1}{t}-\frac{L^2}{1+b}\sqrt{\frac{(t-1)\;(1+t(1+b))\;(b+t(1+b))}{t}}\,dt \ & + \frac{1-\rho}{2} \Big( \log(1+b) - \frac{b(2+b)}{1+b+b^2} + \int_{1/(1+b)}^{\infty} tL^2-\frac{1}{t}-\frac{L^2}{1+b}\sqrt{\frac{(1+t)\;(t(1+b)-1)\;(t(1+b)-b)}{t}}\,dt \Big) , \end{align} with $b$ and $L$ as defined above. The rate function $R(\rho)$ is accessible via rate() from the symconivol package. See above for plots of $R(\rho)$ and its derivative.

Limit curve for index normalized curvature measures {#leighscurve}

Recall the notation for the empirical distribution function in the index constrained case: [ g_r(t) = \frac{1}{n} \sum_{j=1}^n\delta_{y_j}(t) ,\quad y=\frac{x}{\sqrt{\beta n}} ,\quad x \sim \text{G}\beta\text{E}\mid\text{ind}=r . ] Similarly, we define the empirical distribution function for fixed $x$, [ g(x;t) = \frac{1}{n} \sum_{j=1}^n\delta_{y_j}(t) ,\quad y=\frac{x}{\sqrt{\beta n}} , ] so that [ g_r(t) = g(x;t) \quad \text{if } x \sim \text{G}\beta\text{E}\mid\text{ind}=r . ] We can use this function to express the squared norm of the positive part of $x$: [ \|x_+\|^2 = n\int_0^\infty \big(\sqrt{\beta n}\,t\big)^2 g(x;t) dt = \beta n^2\int_0^\infty t^2 g(x;t) dt , ] and similarly the squared norm of the negative part of $x$, $\|x_-\|^2=\beta n^2\int_{-\infty}^0 t^2 g(x;t) dt$. The convergence of the distribution function implies^[This statement depends on the type of convergence of the empirical distribution function. In fact, the paper [@MNSV11] is a bit nonrigorous concerning these technical details. But the close connection between the empirical distribution function and the squared norms of the positive and negative parts allows for derivations of rigorous convergence results for these from similar results for the empirical distribution function. Also note that one can write the squared norm of the eigenvalues as the trace of the squared matrix, which yields another slightly different perspective, which in the unconstrained case is well understood. Again, the difference to the classical case lies in the constraints on the index and the replacement of the negative eigenvalues with zero.] the concentration of $\frac{\|x_+\|^2}{d}$ (recall that $d\approx\beta n^2/2$) around its mean, [ \text{E}\Big[\frac{\|x_+\|^2}{d}\Big] = \frac{\beta n^2}{d} \int_0^\infty t^2 g_r(t) dt , ] which itself converges, for $n\to\infty$, to \begin{align} \ell(\rho) & = 2 \int_0^\infty s^2 g_\rho^(s)\,ds \ & = \frac{2L}{\pi(1+b)} \int_0^L s^2 \sqrt{\frac{(1-s/L)\;(1+s/L(1+b))\;(b+s/L(1+b))}{s/L}}\,ds \ & = \frac{2L^4}{\pi(1+b)} \int_0^1 \sqrt{t^3\;(1-t)\;(1+t(1+b))\;(b+t(1+b))}\,dt , \end{align} with $b$ and $L$ as defined above. We call the curve $\ell\colon[0,1]\to[0,1]$ Leigh's curve. It is accessible via leigh() from the symconivol package. It can be expressed in terms of elliptic integrals in the following way: \begin{align} \ell(\rho) & = \rho + \frac{3b(1+b)\sqrt{1+2b}}{\pi(1+b+b^2)^2} \bigg( E\Big(\sqrt{\frac{1-b^2}{1+2b}}\Big) - \frac{2+b}{3} K\Big(\sqrt{\frac{1-b^2}{1+2b}}\Big) \bigg) . \end{align*} Here is a plot of this curve:

int_neg <- function(a) {
    bnd <- bnd_a(a)
    return(integrate(rho_a(a,2),lower=bnd$neg_low,upper=bnd$neg_upp)$value)
}
leigh <- 2*sapply(a, int_neg)
y_pat <- seq(0,1,length.out=1e2)
pat_bd_low <- tibble(patX=y_pat^2, patY=y_pat)
pat_bd_upp <- tibble(patX=1-y_pat^2, patY=1-y_pat)
ggplot() +
    geom_line(data=tibble(rho=1-rho,y=leigh),  aes(rho,y)) +
    geom_line(data=tibble(rho=rho,  y=1-leigh),aes(rho,y)) +
    theme_bw() +
    geom_line(data=pat_bd_low, aes(patY, patX), linetype=2, alpha=0.5) +
    geom_line(data=pat_bd_upp, aes(patY, patX), linetype=2, alpha=0.5) +
    xlab(TeX("$\\rho$")) + ylab(TeX("$\\ell(\\rho)$"))

The dashed curves indicate as usual the (asymptotic) Pataki range. We conjecture, and it seems apparent from the example computations above, that the index normalized curvature measures converge to Leigh's curve, [ \lim_{n\to\infty} \sum_{k\leq \kappa d}\Phi^{\text{ind}}{k,\text{round}(\rho n)} \stackrel{(?)}{=} \begin{cases} 0 & \text{if } \kappa<\ell(\rho) \ 1 & \text{if } \kappa>\ell(\rho) . \end{cases} ] Similarly, denoting the curve that we derived above for the dimension normalized curvature measures (relying on some assumptions about the higher dimensional behavior of the curvature measures) by $\mu(\kappa)$, that is, [ \mu(\kappa) = \text{argmin}\Big{ \frac{(\kappa-\ell(\rho))^2}{1-|2\rho-1|} + 4C\, R(\rho) \,\Big|\; 1-\sqrt{1-\kappa} \leq \rho \leq \sqrt{\kappa} \Big} , ] we conjecture that the dimension normalized curvature measures converge to this curve, [ \lim{n\to\infty} \sum_{r\leq \rho n}\Phi^{\text{dim}}_{\text{round}(\kappa d),r} \stackrel{(?)}{=} \begin{cases} 0 & \text{if } \rho<\mu(\kappa) \ 1 & \text{if } \rho>\mu(\kappa) . \end{cases} ]

Application in semidefinite programming {#appl_SDP}

Generally speaking, a conic program with reference cone $\mathcal{C}$ is maximizing (or minimizing) a linear functional over the intersection of the (closed convex) cone $\mathcal{C}$ with an affine linear subspace of codimension $k$ (generically, the subspace is given by $k$ linear equations). A natural random model for this setup is to assume that

  1. the linear functional is generic in that is is nonzero, and random in that its direction is uniformly random in the unit sphere,
  2. the affine linear subspace is generic in that it does not contain the origin, and random in that its associated linear subspace (which goes through the origin) is uniformly random in the set of all linear subspaces with the same dimension (Grassmann manifold).

Of course, as is the case for every random model used for analyzing an algorithm, there are complaints about the applicability of the model to "real-world" situations. But the described model is a natural model to start with, and once this is understood one might think of extending it (maybe there is a "universality law"; or if there isn't, one might investigate what factors are key for influencing the model to show a different behavior).

In [@AB15s] it was shown that the probabilities for the random conic program being infeasible or unbounded are given in terms of the intrinsic volumes of the cone: \begin{align} \text{Prob}{\text{program is infeasible}} & \sum_{j=0}^{k-1} v_j(\mathcal{C}) , \ \text{Prob}{\text{program is unbounded}} & \sum_{j=k+1}^d v_j(\mathcal{C}) . \end{align} Now, this result may add to the feeling that the random model is useless for real-world applications, because it implies that with high probability a random program is either infeasible or unbounded. But this conclusion is premature, as one way out of this dilemma is simply conditioning the random model to the conic program being neither infeasible nor unbounded. The same arguments as above can be used for defending this adjusted random model: it is arguably the most natural model to start with, and one may think of extending it, once its behavior is fully understood.

The random conic program conditioned on being neither infeasible nor unbounded almost surely has a unique solution, which we denote by $S$. If we now assume that the reference cone $\mathcal{C}$ is the curve of positive semidefinite matrices, then one may ask about the rank of the solution $S$. These rank probabilities are given by the dimension normalized curvature measures: [ \text{Prob}{\text{rank}(S)=r} = \Phi^{\text{dim}}_{rk} . ] With the above conjecture about the high-dimensional behavior of the dimension normalized curvature measures in mind, we conclude that one might use the limit curve $\mu(\kappa)$ to predict the rank of the solution of a random semidefinite program: [ \text{rank}(S) \approx \mu\big(\tfrac{k}{d}\big)\,n . ]

Appendix: Comparing curvature measures and algebraic degree {#alg_deg}

Another parameter associated to the semidefinite cone, with $\beta=1$, and satisfying the Pataki inequalities is the algebraic degree of semidefinite programming $\delta(m,n,r)$, defined in [@NRS10]. (It would be interesting to have some theory about the cases $\beta=2,4$, but so far this seems to have not been addressed.) More precisely, $\delta(m,n,r)$ with $1\leq m\leq n-1$ is nonzero only if [ r+\binom{r}{2} \leq d-m \leq d-\Big(n-r+\binom{n-r}{2}\Big) . ] Some values of the algebraic degree (for $n\leq 14$) are accessible via the function alg_deg() from the symconivol package. The values were computed with the formula for $\delta(m,n,r)$ given in [@H16].

So far no connection between the curvature measures and the algebraic degree is known that goes beyond these inequalities, but empirically one may observe the same asymptotic behavior as the curvature measures. We illustrate this by looking at the normalized marginals [ \frac{\sum_m\delta(m,n,r)}{\sum_{r,m}\delta(m,n,r)} ,\quad \frac{\sum_r\delta(m,n,r)}{\sum_{r,m}\delta(m,n,r)} , ] which we compare with the asymptotic limits of the index probabilities $\text{Prob}(\text{ind}(r))$ and of the intrinsic volumes, respectively. That is, we compare the normalized marginals with the (normalized) rate $\exp(-n^2R(r/n))$ and with the normal distribution with mean $d/2$ and variance $d\big(\frac{8}{\pi^2}-\frac{1}{2}\big)$, respectively:

plotsAD <- list()
j <- 0
for (n in c(4,8,14)) {
    pat <- pat_bnd(1,n)
    d <- pat$d
    AD <- matrix(as.numeric(alg_deg(n)),d-1,n-1)
    e <- colSums(AD)
    e <- e/max(e)
    v <- rowSums(AD)
    v <- v/max(v)
    tib_R <- rate()$table
    tib_R$r <- n*tib_R$rho
    tib_R$est <- -2*d*(tib_R$R-min(tib_R$R))
    tib_N <- tibble(x=seq(0,d,length=1e3))
    tib_N$y <- -(tib_N$x-d/2)^2/(2*d*(8/pi^2-1/2))
    tib_N$y <- tib_N$y-max(tib_N$y)
    j <- j+1
    plotsAD[[j]] <- ggplot(tibble(r=1:(n-1), e=e), aes(x=r,y=log(e))) + geom_line() + theme_bw() +
        coord_cartesian(xlim = c(0, n)) + theme(axis.title.y=element_blank(),
              axis.text.y=element_blank(), axis.ticks.y=element_blank(),
              panel.border = element_blank(), panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              axis.line = element_line(colour = "black")) +
              xlab(TeX(sprintf("$\\;\\;\\; r \\;\\;\\; (n=%d)$", n))) +
        geom_line(data=tib_R, aes(x=r, y=est),color="red", alpha=0.5)
    j <- j+1
    plotsAD[[j]] <- ggplot(tibble(k=1:(d-1), v=v), aes(x=k,y=log(v))) + geom_line() + theme_bw() +
        coord_cartesian(xlim = c(0, d)) + theme(axis.title.y=element_blank(),
              axis.text.y=element_blank(), axis.ticks.y=element_blank(),
              panel.border = element_blank(), panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              axis.line = element_line(colour = "black")) +
              xlab(TeX(sprintf("$\\;\\;\\; m \\;\\;\\; (n=%d)$", n))) +
        geom_line(data=tib_N, aes(x=x, y=y),color="red", alpha=0.5)
}
multiplot(plotlist = plotsAD, cols = 3)

In the same way we compare the two normalized measures [ \frac{\delta(d-k,n,r)}{\sum_j\delta(d-j,n,r)} ,\quad \frac{\delta(d-k,n,r)}{\sum_s\delta(d-k,n,s)} , ] where we use the parameter $k=d-m$ for a better comparison with the above plots for the normalized curvature measures.

plotsAD <- list()
j <- 0
for (n in c(5,8,14)) {
    beta <- 1
    pat <- pat_bnd(beta,n)
    d <- pat$d
    AD <- matrix(0,d+1,n+1)
    A  <- matrix(0,d+1,n+1)
    B  <- matrix(0,d+1,n+1)
    AD[2:d,2:n] <- matrix(as.numeric(alg_deg(n)),d-1,n-1)
    for (r in 2:n)
        AD[,r] <- rev(AD[,r])
    A[,2:n] <- apply(AD[,2:n],2,function(x) return(x/max(x)))
    B[2:d,] <- t(apply(AD[2:d,],1,function(x) return(x/max(x))))
    j <- j+1
    plotsAD[[j]] <- plot_Phi(A, TRUE, TRUE, FALSE)
    j <- j+1
    plotsAD[[j]] <- plot_Phi(B, TRUE, FALSE, TRUE)
}
multiplot(plotlist = plotsAD, cols = 3)

References {#refs}



damelunx/symconivol documentation built on May 17, 2019, 7:01 p.m.