knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval=TRUE,
  message = FALSE
)

\renewcommand{\vec}[1]{\boldsymbol{#1}}

Introduction

This vignette provides a brief introduction to the clustRviz package, describing how to use the main entry points CARP and CBASS and providing a quick overview of the rich built-in graphics functionality. For more details on graphics, weight selection, or the computational algorithms used, please see the other package vignettes.

Clustering

clustRviz implements the convex clustering formulation popularized by Hocking et al. [-@Hocking:2011] and uses the path-wise algorithms of Weylandt, Nagorski, and Allen [-@Weylandt:2019] to support full path computation and dendrogram construction. This allows convex clustering to produce hclust-style dendrograms while maintaining its statistical and computational advantages.

The main entry point for clustering is the CARP function, which implements the Clustering via Algorithmic Regularization Paths proposed by Weylandt, Nagorski, and Allen [-@Weylandt:2019]. We can use it on the built-in presidential_speech data set:

library(clustRviz)
carp_fit <- CARP(presidential_speech)
print(carp_fit)

As can be seen, this provides a full path in only a few seconds. From this, we can produce a variety of attractive plots, including dendrograms

plot(carp_fit, type = "dendrogram")

one-way heatmaps

plot(carp_fit, type = "heatmap")

and regularization paths

plot(carp_fit, type = "path")

For each plot type, interactive and dynamic versions are also supported: for example,

plot(carp_fit, type = "dendrogram", dynamic = TRUE)

By default, the entire path is shown, but it is possible to obtain specific solutions by specifying the k or percent arguments to plot.

plot(carp_fit, k = 3)

To work with the clustering solutions directly, the get_cluster_labels, get_clustered_data, or get_cluster_centroids functions may be useful.

Bi-Clustering

Chi et al [-@Chi:2017] proposed a convex formulation of biclustering for which Weylandt [-@Weylandt:2019b] later proposed an efficient ADMM algorithm. This ADMM was adapted into the CBASS - Convex Biclustering via Algorithmic Regularization with Small Steps algorithm. clustRviz exposes an implementation of this algorithm via the function of the same name.

library(clustRviz)
cbass_fit <- CBASS(presidential_speech)
print(cbass_fit)

As can be seen, this provides a full path in only a few seconds. In general, the bi-clustering problem is a bit slower than the standard clustering problem but still highly efficient. From this, we can produce a variety of attractive plots, including row- and column-wise dendrograms

plot(cbass_fit, type = "row.dendrogram")
plot(cbass_fit, type = "col.dendrogram")

row- and columnwise regularization paths

plot(cbass_fit, type = "row.path")

and the traditional two-way cluster heatmap

plot(cbass_fit, type = "heatmap")

As before, interactive and dynamic versions are also supported: for example,

plot(cbass_fit, type = "heatmap", dynamic = TRUE)

Because CBASS clusters rows and columns simultaneously, when specifying cluster numbers, it is necessary to distinguish between row and column clusters

plot(cbass_fit, k.row = 3)

This is only a brief demonstration of the capabilities of the clustRviz package - see the other vignettes for more!

References



jjn13/clustRviz documentation built on Sept. 1, 2020, 7:53 a.m.