View source: R/dataProcessing.R
groupmsbatch | R Documentation |
Group features from an msbatch
groupmsbatch(
msbatch,
dmz = 5,
drtagglom = 30,
drt = 15,
minsamples,
minsamplesfrac = 0.25,
parallel = FALSE,
ncores,
verbose = TRUE
)
msbatch |
msbatch obtained from setmsbatch or alignmsbatch functions. |
dmz |
mass tolerance between peak groups for grouping in ppm. |
drtagglom |
rt window for mz partitioning. |
drt |
rt window for peaks clustering. |
minsamples |
minimum number of samples represented in clusters used for grouping. |
minsamplesfrac |
minimum samples fraction represented in each cluster used for grouping. Used to calculate minsamples in case it is missing. |
parallel |
logical. If TRUE, parallel processing is performed. |
ncores |
number of cores to be used in case parallel is TRUE. |
verbose |
print information messages. |
First, peak partitions are created based on the enviPick algorithm to speed up the following clustering algorithm. Briefly, peaks are ordered increasingly by mz and RT and grouped based on user-defined tolerances (dmz and drt). Each peak is initialized as a partition and then, they are evaluated to decide whether or not they can be joined to the previous partition. If mz and RT of a peak matches tolerance of any of the peaks in the previous partition, it is reassigned. Then, clustering algorithm is executed to improve these partitions based on their mz following the next steps for each partition:
1. Each peak in the partition is initialized as a new cluster. For each cluster we will keep the minimum, maximum and mean value of the mz, which at this point have the same values. 2. Calculate a distance matrix between all clusters. This distance will be the greatest difference between minimum and maximum values of each cluster. 3. While any distance is different to NA, search the minimum distance between two clusters. 4. If distance is below the maximum distance allowed, join clusters and update minimum, maximum and mean values, else, set distance to NA and go back to point 3.
Then this same clustring algorithm is executed again to group peaks based on their RT. In this case, distances between clusters which share peaks from the same samples will be set to NA.
After groups have been defined, those clusters with a sample representation over minsamples or minsamplesfrac will be used for building the feature table.
grouped msbatch
M Isabel Alcoriza-Balaguer <maialba@iislafe.es>
Partitioning algorithm has been imported from enviPick R-package: https://cran.r-project.org/web/packages/enviPick/index.html
## Not run:
msbatch <- groupmsbatch(msbatch)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.