Thresher-class | R Documentation |
"Thresher"
The Thresher
class represents the first step of an algorithm that
combines outlier detection with clustering. The object combines the
results of hierarchical clustering and principal components analysis
(with a computation of its dimension) on the same data set.
Thresher(data, nm = deparse(substitute(data)),
metric = "pearson", linkage="ward.D2",
method = c("auer.gervini", "broken.stick"),
scale = TRUE, agfun = agDimTwiceMean)
data |
A data matrix. |
nm |
A character string; the name of this object. |
metric |
A character string containing the name of a clustering metric
recognized by either |
linkage |
A character string containing the name of a linkage rule
recognized by |
method |
A character string describing the algorthim used from the
|
scale |
A logical value; should the data be scaled before use? |
agfun |
A function that will be accepted by the
|
Thresher
operates on a data matrix that is assumed to be
organized with rows equal to samples and columns equal to features
(like genes or proteins). The algorithm begins by centering and (by
default, though this can be overridden with the scale
parameter) standardizes the data columns. It then performs a principal
components analysis, and uses the Auer-Gervini method, as automated in
the PCDimension
package, to determine the number, D
, of
statistically significant principal components. For each
column-feature, it computes and remembers the length of its loading
vector in D-dimensional space. (In case the Auer-Gervini method finds
that D=0
, the length is instead computed using D=1
.) These
loading-lengths will be used later to identify and remove features
that act as outliers and do not contribute to clustering the
samples. Finally, Thresher
computes and saves the results of
hierarchically clustering the features in the data set, using the
specified distance metric
and linkage
rule.
The Thresher
function constructs and returns an object of the
Thresher
class.
Objects should be defined using the Thresher
constructor. In
the simplest case, you simply pass in the data matrix that you want to
cluster using the Thresher algorithm.
name
:Object of class "character"
; the name of
this object.
data
:Object of class "matrix"
; the data that
was used for clustering.
spca
:Object of class "SamplePCA"
; represents
the results of performing a principal components analysis on the
original data
.
loadings
:Object of class "matrix"
; the matrix
of loading vectors from the principal components analysis.
gc
:Object of class "hclust"
; the result of performing
hierarchical clustering on the data columns.
pcdim
:Object of class "numeric"
; the number of
significant principal components.
delta
:Object of class "numeric"
; the lengths of
the loading vectors in the principal component space of dimension
equal to pcdim
.
ag
:Object of class "AuerGervini"
; represents
the result of running the automated Auer-Gervini algorithm to
detemine the number of principal components.
agfun
:A function, which is used as the default method for computing the principal component dimension from the Auer-Gervini plot.
signature(x = "Thresher")
: Produce a scree
plot of the PCA part of the Thresher object.
signature(object = "Thresher")
: Produce a
scatter plot of the first two principal components.
signature(x = "Thresher", y = "missing")
: In two
dimensions, plot the loading vectors of the PCA part of the
object.
signature(object = "Thresher")
: Produce a heatmap
of the data set.
signature(object = "Thresher")
: This is a
convenience function to produce a standard set of figures for a
Thresher
object. These are (1) a scree plot, (2) a plot of
teh Auer-Gervini slot, (3) a scatter plot of the firtst trwo
principal components, (4) one or more plots of the loading
vectors, depending on the PCV dimension, and (5) a heat map.
If the DIR
argument is
non-null, it is treated as the name of an existing directory where the
figures are stored as PNG files. Otherwise, the figures are
displayed interactively, one at a time, in a window on screen.
signature(object = "Thresher")
: Returns the
vector of colors assigned to the clustered columns in the data set.
signature(object = "Thresher")
: Returns the
vector of colors assigned to the clustered rows in the data set.
signature(object = "Thresher")
: I refuse to
document this, since I am not convinced that it should actually exist.
Kevin R. Coombes <krc@silicovore.com>, Min Wang.
Wang M, Abrams ZB, Kornblau SM, Coombes KR. Thresher: determining the number of clusters while removing outliers. BMC Bioinformatics, 2018; 19(1):1-9. doi://10.1186/s12859-017-1998-9.
Wang M, Kornblau SM, Coombes KR. Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components. bioRxiv, 2017. doi://10.1101/237883.
Thresher
, Reaper-class
, AuerGervini-class
set.seed(3928270)
ranData <- matrix(rnorm(100*12), ncol=12)
colnames(ranData) <- paste("G", 1:12, sep='')
thresh <- Thresher(ranData) # fit the model
screeplot(thresh) # check the scree plot; suggests dim = 4
plot(thresh@ag, list(thresh@agfun)) # Auer-Gervini object; dim = 0
scatter(thresh) # PCA scatter plot (rows = samples)
plot(thresh) # PCA loadings plot (cols = features)
heat(thresh) # ubiquitous 2-way heatmap
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.