Thresher-class: Class '"Thresher"'

Description Usage Arguments Details Value Objects from the Class Slots Methods Author(s) References See Also Examples

Description

The Thresher class represents the first step of an algorithm that combines outlier detection with clustering. The object combines the results of hierarchical clustering and principal components analysis (with a computation of its dimension) on the same data set.

Usage

1
2
3
4
Thresher(data, nm = deparse(substitute(data)),
         metric = "pearson", linkage="ward.D2",
         method = c("auer.gervini", "broken.stick"),
         scale = TRUE, agfun = agDimTwiceMean)

Arguments

data

A data matrix.

nm

A character string; the name of this object.

metric

A character string containing the name of a clustering metric recognized by either dist or distanceMatrix.

linkage

A character string containing the name of a linkage rule recognized by hclust.

method

A character string describing the algorthim used from the PCDimension package to compute the number of significant components.

scale

A logical value; should the data be scaled before use?

agfun

A function that will be accepted by the AuerGervini function in the PCDimension package.

Details

Thresher operates on a data matrix that is assumed to be organized with rows equal to samples and columns equal to features (like genes or proteins). The algorithm begins by centering and (by default, though this can be overridden with the scale parameter) standardizes the data columns. It then performs a principal components analysis, and uses the Auer-Gervini method, as automated in the PCDimension package, to determine the number, D, of statistically significant principal components. For each column-feature, it computes and remembers the length of its loading vector in D-dimensional space. (In case the Auer-Gervini method finds that D = 0, the length is instead computed using D = 1.) These loading-lengths will be used later to identify and remove features that act as outliers and do not contribute to clustering the samples. Finally, Thresher computes and saves the results of hierarchically clustering the features in the data set, using the specified distance metric and linkage rule.

Value

The Thresher function constructs and returns an object of the Thresher class.

Objects from the Class

Objects should be defined using the Thresher constructor. In the simplest case, you simply pass in the data matrix that you want to cluster using the Thresher algorithm.

Slots

name:

Object of class "character"; the name of this object.

data:

Object of class "matrix"; the data that was used for clustering.

spca:

Object of class "SamplePCA"; represents the results of performing a principal components analysis on the original data.

loadings:

Object of class "matrix"; the matrix of loading vectors from the principal components analysis.

gc:

Object of class "hclust"; the result of performing hierarchical clustering on the data columns.

pcdim:

Object of class "numeric"; the number of significant principal components.

delta:

Object of class "numeric"; the lengths of the loading vectors in the principal component space of dimension equal to pcdim.

ag:

Object of class "AuerGervini"; represents the result of running the automated Auer-Gervini algorithm to detemine the number of principal components.

agfun:

A function, which is used as the default method for computing the principal component dimension from the Auer-Gervini plot.

Methods

screeplot

signature(x = "Thresher"): Produce a scree plot of the PCA part of the Thresher object.

scatter

signature(object = "Thresher"): Produce a scatter plot of the first two principal components.

plot

signature(x = "Thresher", y = "missing"): In two dimensions, plot the loading vectors of the PCA part of the object.

heat

signature(object = "Thresher"): Produce a heatmap of the data set.

makeFigures

signature(object = "Thresher"): This is a convenience function to produce a standard set of figures for a Thresher object. These are (1) a scree plot, (2) a plot of teh Auer-Gervini slot, (3) a scatter plot of the firtst trwo principal components, (4) one or more plots of the loading vectors, depending on the PCV dimension, and (5) a heat map. If the DIR argument is non-null, it is treated as the name of an existing directory where the figures are stored as PNG files. Otherwise, the figures are displayed interactively, one at a time, in a window on screen.

getColors

signature(object = "Thresher"): Returns the vector of colors assigned to the clustered columns in the data set.

getSplit

signature(object = "Thresher"): Returns the vector of colors assigned to the clustered rows in the data set.

getStyles

signature(object = "Thresher"): I refuse to document this, since I am not convinced that it should actually exist.

Author(s)

Kevin R. Coombes <krc@silicovore.com>, Min Wang.

References

Wang M, Abrams ZB, Kornblau SM, Coombes KR. Thresher: determining the number of clusters while removing outliers. BMC Bioinformatics, 2018; 19(1):1-9. doi://10.1186/s12859-017-1998-9.

Wang M, Kornblau SM, Coombes KR. Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components. bioRxiv, 2017. doi://10.1101/237883.

See Also

Thresher, Reaper-class, AuerGervini-class

Examples

1
2
3
4
5
6
7
8
9
set.seed(3928270)
ranData <- matrix(rnorm(100*12), ncol=12)
colnames(ranData) <- paste("G", 1:12, sep='')
thresh <- Thresher(ranData) # fit the model
screeplot(thresh)           # check the scree plot; suggests dim = 4
plot(thresh@ag, list(thresh@agfun)) # Auer-Gervini object; dim = 0
scatter(thresh)             # PCA scatter plot  (rows = samples)
plot(thresh)                # PCA loadings plot (cols = features)
heat(thresh)                # ubiquitous 2-way heatmap

Thresher documentation built on Dec. 8, 2019, 3:01 a.m.