align_peaks: align peaks individually among chromatograms

Description Usage Arguments Details Value Author(s)

View source: R/align_peaks.R

Description

align_peaks allows to align similar peaks across samples so that shared peaks are consistently located at the the same location (i.e. defined as the same substance). The order of chromatograms (i.e. data.frames in gc_peak_list) is randomized before each run of the alignment of algorithm (if randomisation is not needed, this behaviour can be changed by setting permute = FALSE). The main principle of this function is to reduce the variance in retention times within rows, thereby peaks of similar retention time are grouped together. Peaks that deviate significantly from the mean retention times of the other samples are shifted to another row. At the start of a row the first two samples are compared and separated if required, then all other samples are included consecutively. If iterations > 1 the whole algorithm is repeated accordingly.

Usage

1
2
3
4
5
6
7
8
align_peaks(
  gc_peak_list,
  max_diff_peak2mean = 0.02,
  iterations = 1,
  rt_col_name,
  permute = TRUE,
  R = 1
)

Arguments

gc_peak_list

List of data.frames. Each data.frame contains GC-data (e.g. retention time, peak area, peak height) of one sample. Variables are stored in columns. Rows represent distinct peaks. Retention time is a required variable.

max_diff_peak2mean

Numeric value defining the allowed deviation of the retention time of a given peak from the mean of the corresponding row (i.e. scored substance). This parameter reflects the retention time range in which peaks across samples are still matched as homologous peaks (i.e. substance). Peaks with retention times exceeding the threshold are sorted into a different row.

rt_col_name

A character giving the name of the column containing the retention times. The decimal separator needs to be a point.

permute

Boolean, by default a random permutation of samples is conducted prior for each row-wise alignment step. Setting this parameter to FALSE causes alignment of the dataset as it is.

order of samples is constantly randomised during the alignment. Allows to prevent this behaviour for maximal repeatability if needed.

R

integer indicating the current iteration of the alignment step. Created by align_chromatograms.

gc_peak_df

data.frame containing GC-data (e.g. retention time, peak area, peak height) of one sample. Variables are stored in columns, rows represent peaks.

Details

For each row the retention time of every sample is compared to the mean retention time of all previously examined samples within the same row. Starting with the second sample a comparison is done between the first and the second sample, then between the third and the two first ones and so on. Whenever the current sample shows a deviation from the mean retention time of the previous samples a shift will either move this sample to the next row (i.e. retention time above average) or all other samples will be moved to the next row (i.e. retention time below average). If the retention time of the sample in evaluation shows no deviation within -max_diff_peak2mean: max_diff_peak2mean around the mean retention time no shifting is done and the algorithm proceeds with the following sample.

Value

a list of data.frames containing GC-data with aligned peaks.

Author(s)

Martin Stoffel (martin.adam.stoffel@gmail.com) & Meinolf Ottensmann (meinolf.ottensmann@web.de)


GCalignR documentation built on Aug. 26, 2020, 9:06 a.m.