getClusters: getClusters

View source: R/StatTools.R

getClustersR Documentation

getClusters

Description

From the data matrix generated from the integration of all bucket zones (columns) for each spectrum (rows), we can take advantage of the concentration variability of each compound in a series of samples by performing a clustering based on significant correlations that link these buckets together into clusters. Bucket Clustering based on either a lower threshold applied on correlations or a cutting value applied on a hierarchical tree of the variables (buckets) generated by an Hierarchical Clustering Analysis (HCA).

Usage

getClusters(data, method = "hca", ...)

Arguments

data

the matrix including the integrations of the areas defined by the buckets (columns) on each spectrum (rows)

method

Clustering method of the buckets. Either 'corr' for 'correlation' or 'hca' for 'hierarchical clustering analysis'.

...

Depending on the chosen method:

  • corr : cval, dC, ncpu

  • hca : vcutusr

Details

At the bucketing step (see above), we have chosen the intelligent bucketing, it means that each bucket exact matches with one resonance peak. Thanks to this, the buckets now have a strong chemical meaning, since the resonance peaks are the fingerprints of chemical compounds. However, to assign a chemical compound, several resonance peaks are generally required in 1D 1 H-NMR metabolic profiling. To generate relevant clusters (i.e. clusters possibly matching to chemical compounds), two approaches have been implemented:

  • Bucket Clustering based on a lower threshold applied on correlations

    • In this approach an appropriate correlation threshold is applied on the correlation matrix before its cluster decomposition. Moreover, an improvement can be done by searching for a trade-off on a tolerance interval of the correlation threshold : from a fixed threshold of the correlation (cval), the clustering is calculated for the three values (cval-dC, cval, cval+dC), where dC is the tolerance interval of the correlation threshold. From these three sets of clusters, we establish a merger according to the following rules: 1) if a large cluster is broken, we keep the two resulting clusters. 2) If a small cluster disappears, the initial cluster is conserved. Generally, an interval of the correlation threshold included between 0.002 and 0.01 gives good trade-off.

  • Bucket Clustering based on a hierarchical tree of the variables (buckets) generated by an Hierarchical Clustering Analysis (HCA)

    • In this approach a Hierachical Classification Analysis (HCA, hclust) is applied on the data after calculating a matrix distance ("euclidian" by default). Then, a cut is applied on the tree (cutree) resulting from hclust, into several groups by specifying the cut height(s). For finding best cut value, the cut height is chosen i) by testing several values equally spaced in a given range of the cut height, then, 2) by keeping the one that gives the more cluster and by including most bucket variables. Otherwise, a cut value has to be specified by the user (vcutusr)

Value

getClusters returns a list containing the following components:

  • vstats Statistics that served to find the best value of the criterion (matrix)

  • clusters List of the ppm value corresponding to each cluster. the length of the list equal to number of clusters

  • clustertab the associations matrix that gives for each cluster (column 2) the corresponding buckets (column 1)

  • params List of parameters related to the chosen method for which the clustering was performed.

  • vcrit Value of the (best/user) criterion, i.e correlation threshold for 'corr' method or the cut value for the 'hca' method.

  • indxopt Index value within the vstats matrix corresponding to the criterion value (vcrit)

References

Jacob D., Deborde C. and Moing A. (2013) An efficient spectra processing method for metabolite identification from 1H-NMR metabolomics data. Analytical and Bioanalytical Chemistry 405(15) 5049-5061 doi: 10.1007/s00216-013-6852-y

Examples

 
  data_dir <- system.file("extra", package = "Rnmr1D")
  cmdfile <- file.path(data_dir, "NP_macro_cmd.txt")
  samplefile <- file.path(data_dir, "Samples.txt")
  out <- Rnmr1D::doProcessing(data_dir, cmdfile=cmdfile, 
                                samplefile=samplefile, ncpu=2)
  outMat <- getBucketsDataset(out, norm_meth='CSN')
  clustcorr <- getClusters(outMat, method='corr', cval=0, dC=0.003, ncpu=2)
  clusthca <- getClusters(outMat, method='hca', vcutusr=0)
 

INRA/Rnmr1D documentation built on April 11, 2024, 1:29 a.m.