peak_interspace: Estimate the observed space between peaks within...

View source: R/peak_interspace.R

peak_interspaceR Documentation

Estimate the observed space between peaks within chromatograms

Description

The parameter min_diff_peak2peak is a major determinant in the alignment of a dataset with align_chromatograms. This function helps to infer a suitable value based on the input data. The underlying assumption here is that distinct peaks within a separated by a larger gap than homologous peaks across samples. Tightly spaced peaks within a sample will appear on the left side of the plotted distribution and can indicate the presence of split peaks in the data.

Usage

peak_interspace(
  data,
  rt_col_name = NULL,
  sep = "\t",
  quantiles = NULL,
  quantile_range = c(0, 1),
  by_sample = FALSE
)

Arguments

data

Dataset containing peaks that need to be aligned and matched. For every peak a arbitrary number of numerical variables can be included (e.g. peak height, peak area) in addition to the mandatory retention time. The standard format is a tab-delimited text file according to the following layout: (1) The first row contains sample names, the (2) second row column names of the corresponding peak lists. Starting with the third row, peak lists are included for every sample that needs to be incorporated in the dataset. Here, a peak list contains data for individual peaks in rows, whereas columns specify variables in the order given in the second row of the text file. Peak lists of individual samples are concatenated horizontally and need to be of the same width (i.e. the same number of columns in consistent order). Alternatively, the input may be a list of data frames. Each data frame contains the peak data for a single individual. Variables (i.e.columns) are named consistently across data frames. The names of elements in the list are used as sample identifiers. Cells may be filled with numeric or integer values but no factors or characters are allowed. NA and 0 may be used to indicate empty rows.

rt_col_name

A character giving the name of the column containing the retention times. The decimal separator needs to be a point.

sep

The field separator character. The default is tab separated (sep = '\t'). See the "sep" argument in read.table for details.

quantiles

A numeric vector. Specified quantiles are calculated from the distribution.

quantile_range

A numeric vector of length two that allows to subset an arbitrary interquartile range.

by_sample

A logical that allows to calculate peak interspaces individually for each sample. By default all samples are combined to give the global distribution of next-peak differences in retention times. When by_sample = TRUE, a series of plots (one for each sample) is created and a keystroke is required to proceed.

Value

List containing summary statistics of the peak interspace distribution

Author(s)

Martin Stoffel (martin.adam.stoffel@gmail.com) & Meinolf Ottensmann (meinolf.ottensmann@web.de)

Examples

## plotting with defaults
peak_interspace(data = peak_data, rt_col_name = "time")
## plotting up to the 0.95 quantile
peak_interspace(data = peak_data,rt_col_name = "time",quantile_range = c(0,0.95))
## return the 0.1 quantile
peak_interspace(data = peak_data,rt_col_name = "time", quantiles = 0.1)


GCalignR documentation built on Feb. 16, 2023, 5:23 p.m.