Using liminal to understand high dimensional parameter space

  collapse = TRUE,
  comment = "#>"

This example is modified from the examples tours described in @Cook2018-jm. Here we use a tour to explore principal components space and any non-linear structure and clusters via t-SNE.

Setting up the data

Data were obtained from CT14HERA2 parton distribution function fits as used in @Cook2018-jm. There are 28 directions in the parameter space of parton distribution function fit, each point in the variables labelled X1-X56 indicate moving +- 1 standard deviation from the 'best' (maximum likelihood estimate) fit of the function. Each observation has all predictions of the corresponding measurement from an experiment. (see table 3 in that paper for more explicit details).

The remaining columns are:

First, we take the load the data as a data.frame:


Linear embeddings and the tour

First we can estimate all nrow(pdfsense) principal components using on the parton distribution fits:

pcs  <- prcomp(pdfsense[, 7:ncol(pdfsense)])

Using this data structure, we can produce a screeplot:

res <- data.frame(
  component = 1:56, 
  variance_explained = cumsum(pcs$sdev / sum(pcs$sdev))

ggplot(res, aes(x = component, y = variance_explained)) +
  geom_point() +
    breaks = seq(0, 60, by = 5)
  ) +
    labels = function(x) paste0(100*x, "%")

Approximately 70% of the variance in the pdf fits are explained by the first 15 principal components.

Next we augment our original data with the principal components:

pdfsense <- dplyr::bind_cols(
pdfsense$Type <- factor(pdfsense$Type)

We can view a simple tour vialimn_tour() and color points by their experimental group

limn_tour(pdfsense, PC1:PC6, Type)

Non-Linear embeddings

Now we can set up a non-linear embedding via t-SNE, here we embed all 56 principal components.

start <- clamp_sd(as.matrix(dplyr::select(pdfsense, PC1, PC2)), sd = 1e-4)
tsne <- Rtsne::Rtsne(
  dplyr::select(pdfsense, PC1:PC56),
  pca = FALSE, 
  normalize = TRUE,
  perplexity = 50,
  exaggeration_factor = nrow(pdfsense) / 100,
  Y_init = start

Once we have run t-SNE we tidy it into a data.frame, to perform a linked tour.

tsne_embedding <-$Y)
tsne_embedding <- dplyr::rename(tsne_embedding, tsneX = V1, tsneY = V2)
tsne_embedding$Type <- pdfsense$Type

We can view the clusters using a static scatter plot:

       aes(x = tsneX, y = tsneY, color = Type)) +
  geom_point() +
  scale_color_manual(values = limn_pal_tableau10())

We can link a tour view next to the embedding to give us a clear picture of the clustering:

  tour_data = pdfsense,
  embed_data = tsne_embedding,
  cols = PC1:PC6,
  color = Type

References {-}

Try the liminal package in your browser

Any scripts or data that you put into this service are public.

liminal documentation built on May 28, 2021, 9:06 a.m.