High Dimensional Data Visualization

knitr::opts_chunk$set(echo = TRUE, 
                      warning = FALSE,
                      message = FALSE,
                      fig.align = "center", 
                      fig.width = 7, 
                      fig.height = 5,
                      out.width = "60%", 
                      collapse = TRUE,
                      comment = "#>",
                      tidy.opts = list(width.cutoff = 65),
                      tidy = FALSE)
library(knitr)
set.seed(12314159)
imageDirectory <- "./images/highDim"
dataDirectory <- "./data/highDim"
path_concat <- function(path1, ..., sep="/") {
  # The "/" is standard unix directory separator and so will
  # work on Macs and Linux.
  # In windows the separator might have to be sep = "\" or 
  # even sep = "\\" or possibly something else. 
  paste(path1, ..., sep = sep)
}

library(ggplot2, quietly = TRUE)
library(dplyr, quietly = TRUE)

Serialaxes coordinate

Serial axes coordinate is a methodology for visualizing the $p$-dimensional geometry and multivariate data. As the name suggested, all axes are shown in serial. The axes can be a finite $p$ space or transformed to an infinite space (e.g. Fourier transformation).

In the finite $p$ space, all axes can be displayed in parallel which is known as the parallel coordinate; also, all axes can be displayed under a polar coordinate that is often known as the radial coordinate or radar plot. In the infinite space, a mathematical transformation is often applied. More details will be explained in the sub-section Infinite axes

A point in Euclidean $p$-space $R^p$ is represented as a polyline in serial axes coordinate, it is found that a point <--> line duality is induced in the Euclidean plane $R^2$ [@146402].

Before we start, a couple of things should be noticed:

Finite axes

Suppose we are interested in the data set iris. A parallel coordinate chart can be created as followings:

library(ggmulti)
# parallel axes plot
ggplot(iris, 
       mapping = aes(
         Sepal.Length = Sepal.Length,
         Sepal.Width = Sepal.Width,
         Petal.Length = Petal.Length,
         Petal.Width = Petal.Width,
         colour = factor(Species))) +
  geom_path(alpha = 0.2)  + 
  coord_serialaxes() -> p
p

A histogram layer can be displayed by adding layer geom_histogram

p + 
  geom_histogram(alpha = 0.3, 
                 mapping = aes(fill = factor(Species))) + 
  theme(axis.text.x = element_text(angle = 30, hjust = 0.7))

A density layer can be drawn by adding layer geom_density

p + 
  geom_density(alpha = 0.3, 
               mapping = aes(fill = factor(Species)))

A parallel coordinate can be converted to radial coordinate by setting axes.layout = "radial" in function coord_serialaxes.

p$coordinates$axes.layout <- "radial"
p

Note that: layers, such as geom_histogram, geom_density, etc, are not implemented in the radial coordinate yet.

Infinite axes

@andrews1972plots plot is a way to project multi-response observations into a function $f(t)$, by defining $f(t)$ as an inner product of the observed values of responses and orthonormal functions in $t$

[f_{y_i}(t) = <\ve{y}_i, \ve{a}_t>]

where $\ve{y}_i$ is the $i$th responses and $\ve{a}_t$ is the orthonormal functions under certain interval. Andrew suggests to use the Fourier transformation

[\ve{a}_t = {\frac{1}{\sqrt{2}}, \sin(t), \cos(t), \sin(2t), \cos(2t), ...}]

which are orthonormal on interval $(-\pi, \pi)$. In other word, we can project a $p$ dimensional space to an infinite $(-\pi, \pi)$ space. The following figure illustrates how to construct an "Andrew's plot".

p <- ggplot(iris, 
            mapping = aes(Sepal.Length = Sepal.Length,
                          Sepal.Width = Sepal.Width,
                          Petal.Length = Petal.Length,
                          Petal.Width = Petal.Width,
                          colour = Species)) +
  geom_path(alpha = 0.2, 
            stat = "dotProduct")  + 
  coord_serialaxes()
p

A quantile layer can be displayed on top

p + 
 geom_quantiles(stat = "dotProduct",
                quantiles = c(0.25, 0.5, 0.75),
                linewidth = 2,
                linetype = 2) 

A couple of things should be noticed:

[\ve{a}_t = {\cos(t), \cos(\sqrt{2}t), \cos(\sqrt{3}t), \cos(\sqrt{5}t), ...}]

where $t \in [0, k\pi]$ [@gnanadesikan2011methods].

```r
tukey <- function(p = 4, k = 50 * (p - 1), ...) {
  t <- seq(0, p* base::pi, length.out = k)
  seq_k <- seq(p)
  values <- sapply(seq_k,
                   function(i) {
                     if(i == 1) return(cos(t))
                     if(i == 2) return(cos(sqrt(2) * t))
                     Fibonacci <- seq_k[i - 1] + seq_k[i - 2]
                     cos(sqrt(Fibonacci) * t)
                   })
  list(
    vector = t,
    matrix = matrix(values, nrow = p, byrow = TRUE)
  )
}
ggplot(iris, 
       mapping = aes(Sepal.Length = Sepal.Length,
                     Sepal.Width = Sepal.Width,
                     Petal.Length = Petal.Length,
                     Petal.Width = Petal.Width,
                     colour = Species)) +
  geom_path(alpha = 0.2, stat = "dotProduct", transform = tukey)  + 
  coord_serialaxes()
```

Note that: Tukey's suggestion, element $\ve{a}_t$ can "cover" more spheres in $p$ dimensional space, but it is not orthonormal.

An alternative way to create a serial axes plot

Rather than calling function coord_serialaxes, an alternative way to create a serial axes object is to add a geom_serialaxes_... object in our model.

For example, Figure 1 to 4 can be created by calling

g <- ggplot(iris, 
            mapping = aes(Sepal.Length = Sepal.Length,
                          Sepal.Width = Sepal.Width,
                          Petal.Length = Petal.Length,
                          Petal.Width = Petal.Width,
                          colour = Species))
g + geom_serialaxes(alpha = 0.2)
g + 
  geom_serialaxes(alpha = 0.2) + 
  geom_serialaxes_hist(mapping = aes(fill = Species), alpha = 0.2)
g + 
  geom_serialaxes(alpha = 0.2) + 
  geom_serialaxes_density(mapping = aes(fill = Species), alpha = 0.2)
# radial axes can be created by 
# calling `coord_radial()` 
# this is slightly different, check it out! 
g + 
  geom_serialaxes(alpha = 0.2) + 
  geom_serialaxes(alpha = 0.2) + 
  coord_radial()

Figure 5 and 7 can be created by setting "stat" and "transform" in geom_serialaxes; to Figure 6, geom_serialaxes_quantile can be added to create a serial axes quantile layer.

Some slight difference should be noticed here:

# The serial axes is `Sepal.Length`, `Sepal.Width`, `Sepal.Length`
# With meaningful labels
ggplot(iris, 
       mapping = aes(Sepal.Length = Sepal.Length,
                     Sepal.Width = Sepal.Width,
                     Sepal.Length = Sepal.Length)) + 
  geom_path() + 
  coord_serialaxes()

# The serial axes is `Sepal.Length`, `Sepal.Length`
# No meaningful labels
ggplot(iris, 
       mapping = aes(Sepal.Length = Sepal.Length,
                     Sepal.Width = Sepal.Width,
                     Sepal.Length = Sepal.Length)) + 
  geom_serialaxes()

Also, if the dimension of data is large, typing each variate in mapping aesthetics is such a headache. Parameter axes.sequence is provided to determine the axes. For example, a serialaxes object can be created as

ggplot(iris) + 
  geom_path() + 
  coord_serialaxes(axes.sequence = colnames(iris)[-5])

At very end, please report bugs here. Enjoy the high dimensional visualization! "Don't panic... Just do it in 'serial'" [@inselberg1999don].

Reference



Try the ggmulti package in your browser

Any scripts or data that you put into this service are public.

ggmulti documentation built on Nov. 10, 2022, 5:12 p.m.