Mender: CLL Clinical Data
In Mender: Visualizing Topological Loops and Voids

knitr::opts_chunk$set(fig.width=5, fig.height=5)
options(width=96)
.format <- knitr::opts_knit$get("rmarkdown.pandoc.to")
.tag <- function(N, cap ) ifelse(.format == "html",
                                 paste("Figure", N, ":",  cap),
                                 cap)

Introduction

We want to illustrate the Mender package (Version r packageVersion("Mender")) with a clinical data set. Not surprisingly, we start by loading the package.

library(Mender)

We also load several other useful packages (some of which may eventually get incorporated into the package requirements).

suppressMessages( library(Mercator) )
library(ClassDiscovery)
library(Polychrome)
data(Dark24)
data(Light24)
suppressMessages( library(igraph) )
suppressMessages( library("ape") )
suppressPackageStartupMessages( library(circlize) )

Now we fetch the sample clinical data set that is included with the package.

data(CLL)
ls()
dim(clinical)
colnames(clinical)

The clinical object is a numeric matrix containing the clinical data (binary values have been converted to 0-1; categorical values to integers). The daisydist is a distance matrix (stored as a dist object). Coombes and colleagues [J Biomed Inform. 2021 Jun;118:103788] showed that this is the best way to measure distances between mixed-type clinical data. The ripdiag object is a "Rips diagram" produced by applying the TDA algorithm to the daisy distances.

TDA Built-in Visualizations of the Rips Diagram

Here are some plots of the TDA results using tools from the original package. (I am not sure what any of these are really good for.)

diag <- ripdiag[["diagram"]]
opar <- par(mfrow = c(1,2))
plot(diag, barcode = TRUE, main = "Barcode")
plot(diag, main = "Rips Diagram")
par(opar)
rm(opar)

Mercator Visualizations of the Underlying Data and Distance Matrix

Now we use our Mercator package to view the underlying data.

mercury <- Mercator(daisydist, metric = "daisy", method = "hclust", K = 8)
mercury <- addVisualization(mercury, "mds")
mercury <- addVisualization(mercury, "tsne")
mercury <- addVisualization(mercury, "umap")
mercury <- addVisualization(mercury, "som")

opar <- par(mfrow = c(3,2), cex = 1.1)
plot(mercury, view = "hclust")
plot(mercury, view = "mds", main = "Mult-Dimensional Scaling")
plot(mercury, view = "tsne", main = "t-SNE")
plot(mercury, view = "umap", main = "UMAP")
barplot(mercury, main = "Silhouette Width")
plot(mercury, view = "som", main = "Self-Organizing Maps")
par(opar)
rm(opar)

Dimension Zero

Here is a picture of the "zero-cycle" data, which can also be used ultimately to cluster the points (where each point is a patient). The connected lines are similar to a single-linkage clustering structure, showing when individual points are merged together as the TDA parameter increases.

nzero <- sum(diag[, "dimension"] == 0)
cycles <- ripdiag[["cycleLocation"]][2:nzero]
W <- mercury@view[["umap"]]$layout
plot(W, main = "Connected Zero Cycles")
for (cyc in cycles) {
  points(W[cyc[1], , drop = FALSE], pch = 16,col = "red")
  X <- c(W[cyc[1], 1], W[cyc[2],1])
  Y <- c(W[cyc[1], 2], W[cyc[2],2])
  lines(X, Y)
}

Using iGraph

We can convert the 0-dimensional cycle structure into a dendrogram, by first passing them through the igraph package. We start by putting all the zero-cycle data together, which can be viewed as an "edge-list" from the igraph perspective.

edges <- t(do.call(cbind, cycles)) # this creates an "edgelist"
G <- graph_from_edgelist(edges)
G <- set_vertex_attr(G, "label", value = attr(daisydist, "Labels"))

Note that we attached the sample names to the graph, obtaining them from the daisy distance matrix. Now we use two different algorithms to decide how to layout the graph.

set.seed(2734)
Lt <- layout_as_tree(G)
L <- layout_with_fr(G)

opar <- par(mfrow = c(1,2), mai = c(0.01, 0.01, 1.02, 0.01))
plot(G, layout = Lt, main = "As Tree")
plot(G, layout = L, main = "Fruchterman-Reingold")
par(opar)

Note that the Fruchterman-Reingold layout gives the most informative structure.

Community Structure

There are a variety of community-finding algorithms that we can apply. (Communities in grpah theory are similar to clusters in other machine learning areas of study.) "Edge-betweenness" seems to work best.

keg <- cluster_edge_betweenness(G) # 20
table(membership(keg)) 
pal <- Dark24[membership(keg)]

The first line in the next code chunk shows that we did actually produce a tree. We explore three different ways ro visualize it

is.hierarchical(keg)
H <- as.hclust(keg)
H$labels <- attr(daisydist, "Labels")
K <-  7
colset <- Light24[cutree(H, k=K)]
G2 <- set_vertex_attr(G, "color", value = colset)
e <- 0.01
opar <- par(mai = c(e, e, e, e))
plot(G2, layout = L)
par(opar)

P <- as.phylo(H)
opar <- par(mai = c(0.01, 0.01, 1.0, 0.01))
plot(P, type = "u", tip.color = colset, cex = 0.8, main = "Ape/Cladogram")
par(opar)
rm(opar)

Visualizing Features

In any of the "scatter plot views" (e.g., MDS, UMAP, t-SNE) from Mercator, we may want to overlay different colors to represent different clinical features.

U <- mercury@view[["mds"]]
V <- mercury@view[["tsne"]]$Y
W <- mercury@view[["umap"]]$layout

featMU <- Feature(clinical[,"mutation.status"], "Mutation Status", c("cyan", "red"),
                  c("Mutated", "Unmutated"))
featRai <- Feature(clinical[,"CatRAI"], "Rai Stage", c("green", "magenta"), c("High", "Low"))
opar <- par(mfrow = c(1,2))
plot(W, main = "UMAP; Mutation Status", xlab = "U1", ylab = "U2")
points(featMU, W, pch = 16, cex = 1.4)
plot(W, main = "UMAP; Rai Stage", xlab = "U1", ylab = "U2")
points(featRai, W, pch = 16, cex = 1.4)
par(opar)
rm(opar)

Significance

We have a statistical approach to deciding which of the detected cycles are statistically significant. Empirically, the persistence of 0-dimensional cycles looks like a gamma distribution, while the persistence of higher dimensional cycles looks like an exponential distribution. In both cases, we use an empirical Bayes approach, treating the observed distribution as a mixture of either gamma or exponential (as appropriate) with an unknown distribution contributing to heavier tails.

persistence <- diag[, "Death"] - diag[, "Birth"]

Zero-Cycles (Connected Components)

d0 <- persistence[diag[, "dimension"] == 0]
d0 <- d0[d0 < 1]
summary(d0)
mu <- mean(d0)
nu <- median(d0)
sigma <- sd(d0)
shape <- mu^2/sigma^2
rate <- mu/sigma^2
xx <- seq(0, 0.23, length = 100)
yy <- dgamma(xx, shape = shape, rate = rate)

hist(d0, breaks = 123, freq = FALSE)
lines(xx, yy, col = "purple", lwd = 2)

One-Cycles (Loops)

Now we want to determine if there are significant "loops" in the data, and, if so, how many?

d1 <- persistence[diag[, "dimension"] == 1]
ef <- ExpoFit(d1) # should be close to log(2)/median? 
eb <- EBexpo(d1, 200)
opar <- par(mfrow = c(1,3))
plot(ef)
hist(eb)
plot(eb, prior = 0.56)
par(opar)
rm(opar)
sum(d1 > cutoff(0.8, 0.56, eb)) # posterior 80%, prior 0.56
sum(d1 > 0.065) # post 90%
which(d1 > 0.047)
which(d1 > 0.065)

Let's pick out the most persistent 1-cycle.

cyc1 <- Cycle(ripdiag, 1, 236, "forestgreen")
cyc1@index

Each row represents an edge, by listing the IDs of the points at either end of the line segment. In this case, there are nine edges that link together to form a connected loop (or topological circle).

cyc2 <- Cycle(ripdiag, 1, 123, "red")
cyc3 <- Cycle(ripdiag, 1, 214, "purple")

opar <- par(mfrow = c(1, 3))
plot(cyc1, W, lwd = 2, pch = 16, col = "gray", xlab = "U1", ylab = "U2", main = "UMAP")
lines(cyc2, W, lwd=2)
lines(cyc3, W, lwd=2)

plot(U, pch = 16, col = "gray", main = "MDS")
lines(cyc1, U, lwd = 2)
lines(cyc2, U, lwd = 2)
lines(cyc3, U, lwd = 2)

plot(V, pch = 16, col = "gray", main = "t-SNE")
lines(cyc1, V, lwd = 2)
lines(cyc2, V, lwd = 2)
lines(cyc3, V, lwd = 2)
par(opar)
rm(opar)

poof <- angleMeans(W, ripdiag, cyc1, clinical)
poof[is.na(poof)] <- 0
angle.df <- poof[, c("mutation.status", "CatB2M", "CatRAI", "CatCD38",
                     "Massive.Splenomegaly", "Hypogammaglobulinemia")]
colorScheme <- list(c(M = "green", U = "magenta"),
                    c(Hi = "cyan", Lo ="red"),
                    c(Hi = "blue", Lo = "yellow"),
                    c(Hi = "#dddddd", Lo = "#111111"),
                    c(No = "#dddddd", Yes = "brown"),
                    c(No = "#dddddd", Yes = "purple"))
annote <- LoopCircos(cyc1, angle.df, colorScheme)
image(annote)

Two-Cycles (Voids)

Now we want to determine if there are significant "voids" (empty interiors of spheres) in the data, and, if so, how many?

d2 <- persistence[diag[, "dimension"] == 2]
ef <- ExpoFit(d2) # should be close to log(2)/median? 
eb <- EBexpo(d2, 200)
opar <- par(mfrow = c(1, 3))
plot(ef)
hist(eb)
plot(eb, prior = 0.75)
par(opar)
rm(opar)
sum(d2 > cutoff(0.8, 0.75, eb)) # posterior 80%, prior 0.56
sum(d2 > cutoff(0.95, 0.75, eb)) # posterior 90%, prior 0.56
cutoff(0.95, 0.75, eb)
sum(d2 > 0.032) # post 90%
which(d2 > 0.034)

vd <- Cycle(ripdiag, 2, 95, "purple")
mds <- cmdscale(daisydist, k = 3)
voidPlot(vd, mds)
voidFeature(featMU, mds, radius = 0.011) # need to increase radius when you overlay one sphere on another
rgl::rglwidget()
#htmlwidgets::saveWidget(rglwidget(), "mywidget.html")

mixed <- clinical[,"mutation.status"] + 2*clinical[,"CatB2M"] + 4*clinical[,"Matutes"]
table(mixed)
scheme <- c("cyan", "#cccccc", "blue", "magenta", "green", "orange", "black", "red")
terp <- c("MHH", "UHH", "MLH", "ULH", "MHL", "UHL", "MLL", "ULL")
names(scheme) <- terp
swatchHue(scheme)
mixedFeatures <- Feature(mixed, "Mixed", c("white", "black"), terp)
mixedFeatures@colRamp <-  colorRamp2(0:7, scheme)
plot(mixedFeatures, W, pch = 16, cex = 1.2)
ag <- aggregate(mds, list(mixedFeatures@values), mean, na.rm = TRUE)[, 2:4]
colnames(ag)  <- c("x", "y", "z")
voidPlot(vd, mds, mixedFeatures)
voidPlot(vd, mds)
rgl::spheres3d(ag, color = scheme, radius = 0.05, alpha = 0.5)


featMatutes <- Feature(clinical[,"Matutes"], "Matutes Score", c("blue", "orange"), c("Abnormal", "Normal"))
featB2m <- Feature(clinical[,"CatB2M"], "Beta-2 microglobulin", c("green", "magenta"), c("High", "Low"))

ob <- Projection(vd, mds, mixedFeatures, span = 0.15)
opar <- par(mfrow = c(1,2))
plot(ob, cex = 1.5)
image(ob, col = scheme)
par(opar)
rm(opar)

ob <- Projection(vd, mds, featMU, span = 0.2)
opar <- par(mfrow = c(1,2))
plot(ob)
image(ob, col = colorRampPalette(c("cyan", "gray", "red"))(64))
par(opar)
rm(opar)

rm(list = ls())

Any scripts or data that you put into this service are public.

Mender documentation built on Oct. 25, 2023, 3 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Mender
Visualizing Topological Loops and Voids

Mender: CLL Clinical Data
In Mender: Visualizing Topological Loops and Voids

Introduction

TDA Built-in Visualizations of the Rips Diagram

Mercator Visualizations of the Underlying Data and Distance Matrix

Dimension Zero

Using iGraph

Community Structure

Visualizing Features

Significance

Zero-Cycles (Connected Components)

One-Cycles (Loops)

Two-Cycles (Voids)

Try the Mender package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Mender Visualizing Topological Loops and Voids

Mender: CLL Clinical Data In Mender: Visualizing Topological Loops and Voids

Introduction

TDA Built-in Visualizations of the Rips Diagram

Mercator Visualizations of the Underlying Data and Distance Matrix

Dimension Zero

Using iGraph

Community Structure

Visualizing Features

Significance

Zero-Cycles (Connected Components)

One-Cycles (Loops)

Two-Cycles (Voids)

Try the Mender package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Mender
Visualizing Topological Loops and Voids

Mender: CLL Clinical Data
In Mender: Visualizing Topological Loops and Voids