groupmsbatch: Group features from an msbatch

View source: R/dataProcessing.R

groupmsbatchR Documentation

Group features from an msbatch

Description

Group features from an msbatch

Usage

groupmsbatch(
  msbatch,
  dmz = 5,
  drtagglom = 30,
  drt = 15,
  minsamples,
  minsamplesfrac = 0.25,
  parallel = FALSE,
  ncores,
  deleteduplicates = TRUE,
  thr_overlap_duplicates = 0.7,
  verbose = TRUE
)

Arguments

msbatch

msbatch obtained from setmsbatch or alignmsbatch functions.

dmz

mass tolerance between peak groups for grouping in ppm.

drtagglom

rt window for mz partitioning.

drt

rt window for peaks clustering.

minsamples

minimum number of samples represented in clusters used for grouping.

minsamplesfrac

minimum samples fraction represented in each cluster used for grouping. Used to calculate minsamples in case it is missing.

parallel

logical. If TRUE, parallel processing is performed.

ncores

number of cores to be used in case parallel is TRUE.

deleteduplicates

logical. Whether or not duplicated features should be removed after grouping based on the overlap between peak limits. dmz and drt parameters are used to filter the potential duplicates.

thr_overlap_duplicates

numeric value between 0 and 1 to establish the percentage of overlap threshold to consider two features as duplicated.

verbose

print information messages.

Details

First, peak partitions are created based on the enviPick algorithm to speed up the following clustering algorithm. Briefly, peaks are ordered increasingly by mz and RT and grouped based on user-defined tolerances (dmz and drt). Each peak is initialized as a partition and then, they are evaluated to decide whether or not they can be joined to the previous partition. If mz and RT of a peak matches tolerance of any of the peaks in the previous partition, it is reassigned. Then, clustering algorithm is executed to improve these partitions based on their mz following the next steps for each partition:

1. Each peak in the partition is initialized as a new cluster. For each cluster we will keep the minimum, maximum and mean value of the mz, which at this point have the same values. 2. Calculate a distance matrix between all clusters. This distance will be the greatest difference between minimum and maximum values of each cluster. 3. While any distance is different to NA, search the minimum distance between two clusters. 4. If distance is below the maximum distance allowed, join clusters and update minimum, maximum and mean values, else, set distance to NA and go back to point 3.

Then this same clustering algorithm is executed again to group peaks based on their RT. In this case, distances between clusters which share peaks from the same samples will be set to NA.

After groups have been defined, those clusters with a sample representation over minsamples or minsamplesfrac will be used for building the feature table. Finally, if deleteduplicates is set to TRUE, peaks overlap is checked to avoid duplicated or wrongly defined features.

Value

grouped msbatch

Author(s)

M Isabel Alcoriza-Balaguer <maialba@iislafe.es>

References

Partitioning algorithm has been imported from enviPick R-package: https://cran.r-project.org/web/packages/enviPick/index.html

Examples

## Not run: 
msbatch <- groupmsbatch(msbatch)

## End(Not run)


maialba3/LipidMS documentation built on Sept. 6, 2024, 9:07 p.m.