View source: R/cluster_spectra.R
cluster_spectra | R Documentation |
Function to cluster peaks by spectral similarity. A representative spectrum is selected for each peak in the provided peak table and used to construct a distance matrix based on spectral similarity (pearson correlation) between peaks. Hierarchical clustering with bootstrap resampling is performed on the resulting correlation matrix to classify peaks into by their spectral similarity.
cluster_spectra( peak_table, chrom_list, peak_no = c(5, 100), alpha = 0.95, nboot = 1000, plot_dend = TRUE, plot_spectra = TRUE, verbose = TRUE, save = TRUE, parallel = TRUE, max.only = FALSE, output = c("clusters", "pvclust", "both"), ... )
peak_table |
Peak table from |
chrom_list |
A list of chromatograms in matrix form (timepoints x wavelengths). |
peak_no |
Minimum and maximum thresholds for the number of peaks a cluster may have. |
alpha |
Confidence threshold for inclusion of cluster. |
nboot |
Number of bootstrap replicates for
|
plot_dend |
Logical. If TRUE, plots dendrogram with bootstrap values. |
plot_spectra |
Logical. If TRUE, plots overlapping spectra for each cluster. |
verbose |
Logical. If TRUE, prints progress report to console. |
save |
Logical. If TRUE, saves pvclust object to current directory. |
parallel |
Logical. If TRUE, use parallel processing for
|
max.only |
Logical. If TRUE, returns only highest level for nested dendrograms. |
output |
What to return. Either |
... |
Additional arguments to |
A representative spectrum is selected for each peak in the provided peak table
and used to construct a distance matrix based on spectral similarity
(pearson correlation) between peaks. It is suggested to attach representative
spectra to the peak_table
using attach_ref_spectra
.
Otherwise, representative spectra are obtained from the chromatogram with the
highest absorbance at lambda max.
Hierarchical clustering with bootstrap
resampling is performed on the resulting correlation matrix, as implemented in
pvclust
. Finally, bootstrap values can be used
to select clusters that exceed a certain confidence threshold as defined by
alpha
. Clusters can also be filtered by the minimum and maximum
size of the cluster using the argument peak_no
. If max_only
is TRUE, only the largest cluster in a nested dendrogram of clusters meeting
the confidence threshold will be returned.
Returns clusters and/or pvclust
object according to the value
of the output
argument.
If output = clusters
, returns a list of S4 cluster
objects.
If output = pvclust
, returns a pvclust
object.
If output = both
, returns a nested list containing [[1]]
the
pvclust
object, and [[2]]
the list of
S4 cluster
objects.
The cluster
objects consist of the following components:
peaks
: a character vector containing the names
of all peaks contained in the given cluster.
pval
: a numeric vector of length 1 containing
the bootstrap p-value (au) for the given cluster.
Users should be aware that the clustering algorithm will often return nested clusters. Thus, an individual peak could appear in more than one cluster.
It is highly suggested to use more than 100 bootstraps if you run the
clustering algorithm on real data even though we use nboot = 100
in
the example to reduce runtime. The authors of pvclust
suggest
nboot = 10000
.
Ethan Bass
R. Suzuki & H. Shimodaira. 2006. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22(12):1540-1542. doi: 10.1093/bioinformatics/btl117.
data(pk_tab) data(Sa_warp) cl <- cluster_spectra(pk_tab, nboot=100, max.only = FALSE, save = FALSE, alpha = .97)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.