filtering-funs: Functions to apply cluster-based filtering

Description Usage Arguments Value Examples

Description

Function clusterBased.filter selects only those proposed compounds for which a quasimolecular adduct or fragment has been also proposed in another peak of the same cluster.

Function dataPrep prepares the intensity and retention time data for spectral clustering. Function .LaplacianNg computes a normalized Laplacian matrix.

Function eps.optimization optimizes the epsilon parameter of the dbscan algorithm.

Function featuresClustering performs spectral clustering to group those features that come from the same metabolite. It uses dataPrep, .LaplacianNg, k.optimization and eps.optimization functions. The correlation is computed using the function cor(use = "pairwise.complete.obs").

Function k.optimization optimizes the number of clusters. This value will be used to define the number of eigenvectors considered in the spectral clustering.

Function recoveringPeaks recovers the peaks that have been removed from the first annotated object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
clusterBased.filter(
  df,
  Add.Id = NULL,
  Freq = 0.5,
  Info.Add = NULL,
  polarity,
  do.Par = TRUE,
  nClust = 2
)

dataPrep(IData, Rt, Rt.05 = 5, use = "everything", method = "pearson")

eps.optimization(
  pca.to.tune,
  data.prep,
  IData,
  use = "everything",
  k.tuned,
  method = "pearson",
  do.Par,
  nClust
)

featuresClustering(
  Peak.List,
  Intensity.idx,
  use = "everything",
  method = "pearson",
  Rt.05 = 5,
  do.Par = TRUE,
  nClust
)

k.optimization(
  pca.to.tune,
  data.prep,
  IData,
  nrow.List,
  use = "everything",
  method = "pearson",
  do.Par = TRUE,
  nClust
)

recoveringPeaks(Annotated.Tab, MH.Tab)

Arguments

df

Columns may contain: "Compound", "Add.Id", "Isotope" "Compound" for the proposed candidates. "Add.Id" for the adduct or fragment proposed. "Isotope" to identify the proposed isotopologues.

Add.Id

It indicates the adduct(s) or fragment(s) that are required to exist. If NULL, those adducts with an observed frequency equal or higher than 0.50 will be used.

Freq

Minimum observed frequency to consider an adduct or a fragment to apply the filter (Def: 0.5).

Info.Add

Data frame with adducts and in source fragments information. If NULL, the default mWISE table will be loaded. The columns should be:

  • name with the name of the adduct or fragment

  • nmol with the number of molecules (i.e., 2M+H: nmol=2 )

  • charge with the charge of the adduct or fragment (i.e., M+3H: charge=3)

  • massdiff with the mass difference (i.e., M+H: massdiff=1.007276)

  • quasi with a 1 if the adduct should be considered as quasi-molecular and a 0, otherwise (Optional)

  • polarity with a character vector indicating the polarity of the adduct or fragment. The options are "positive" or "negative"

polarity

Acquisition mode of the study. It can be "positive" or "negative".

do.Par

TRUE if parallel computing is required. Def: TRUE

nClust

Number of cores that may be used if do.Par = TRUE.

IData

Data frame containing the intensity for each sample in its columns.

Rt

Vector containing the retention times.

Rt.05

Retention time value to get a similarity of 0.5.

use

An optional character string giving a method for computing correlations in the presence of missing values. Default is "everything", but when missing values are present, "pairwise.complete.obs" is required.

method

A character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman".

pca.to.tune

PCA to perform the spectral clustering.

data.prep

Result returned by dataPrep function.

k.tuned

Optimized number of clusters (k) computed using k.optimization function.

Peak.List

Data frame containing the LC-MS features. Columns should contain:

  • Peak.Id for a peak identifier

  • mz for a mass-to-charge ratio value

  • rt for the retention time

  • Intensities for each sample

Intensity.idx

Numeric vector indicating the column index for the intensities

nrow.List

Numeric vector indicating the number of peaks.

Annotated.Tab

Data frame returned by matchingStage function.

MH.Tab

Data frame returned by clusterBased.filter function.

Value

Function clusterBased.filter returns a data frame of filtered candidates.

Function dataPrep returns a list containing the Gaussian similarity matrices for the retention time differences and the intensities correlation.

Function eps.optimization returns an optimized epsilon parameter for dbscan algorithm.

Function featuresClustering returns the input peak list with an additional column named pcgroup that indicates the clustering.

Function k.optimization returns the ptimized number of clusters (k) using kmeans algorithm.

Function recoveringPeaks returns a data frame of filtered candidates but with all peaks recovered.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
data("sample.keggDB")
Cpd.Add <- CpdaddPreparation(KeggDB = sample.keggDB, do.Par = FALSE)
data(sample.dataset)
Peak.List <- sample.dataset$Positive$Input
Annotated.List <- matchingStage(Peak.List = Peak.List, Cpd.Add = Cpd.Add,
                                polarity = "positive", do.Par = FALSE)
Intensity.idx <- seq(27,38)
clustered <- featuresClustering(Peak.List = Peak.List, 
                                Intensity.idx = Intensity.idx, 
                                do.Par = FALSE)
Annotated.Tab <- Annotated.List$Peak.Cpd
Annotated.Tab <- merge(Annotated.Tab,
                       clustered$Peak.List[,c("Peak.Id", "pcgroup")],
                       by = "Peak.Id")
                       
MH.Tab <- clusterBased.filter(df = Annotated.Tab, 
                              polarity = "positive")
                              
data(sample.dataset)
Peak.List <- sample.dataset$Positive$Input
Intensity.idx <- seq(27,38)
clustered <- featuresClustering(Peak.List = Peak.List, 
                                Intensity.idx = Intensity.idx, 
                                do.Par = FALSE)
data("sample.keggDB")
Cpd.Add <- CpdaddPreparation(KeggDB = sample.keggDB, 
do.Par = FALSE)
data(sample.dataset)
Peak.List <- sample.dataset$Positive$Input
Annotated.List <- matchingStage(Peak.List = Peak.List, Cpd.Add = Cpd.Add,
                                polarity = "positive", do.Par = FALSE)
Intensity.idx <- seq(27,38)
clustered <- featuresClustering(Peak.List = Peak.List, 
                                Intensity.idx = Intensity.idx, 
                                do.Par = FALSE)
Annotated.Tab <- Annotated.List$Peak.Cpd
Annotated.Tab <- merge(Annotated.Tab,
                       clustered$Peak.List[,c("Peak.Id", "pcgroup")],
                       by = "Peak.Id")
                       
MH.Tab <- clusterBased.filter(df = Annotated.Tab, 
                              polarity = "positive")
                              
recoveredPeaks <- recoveringPeaks(Annotated.Tab = Annotated.Tab,
                                  MH.Tab = MH.Tab)

b2slab/mWISE documentation built on Feb. 2, 2022, 12:24 a.m.