baselineCorrection: Baseline correction - Chang's method
In acinostroza/TargetSearch: A package for the analysis of GC-MS metabolite profiling data

baselineCorrection

R Documentation

Baseline correction - Chang's method

Description

Function for baseline correction of GC-MS chromatograms using Chang's method described below.

Usage

    baselineCorrection(peaks, threshold = 0.5, alpha = 0.95, bfraction = 0.2,
           segments = 100, signalWindow = 10, method = "linear")

Arguments

`peaks`	Either a matrix object of spectra peak intensities to be baseline corrected, where the rows are retention times and columns are mass traces; or, a named list containing an element called `"Peaks"` which such matrix. The list can be generated by `peakCDFextraction`
`threshold`	A numeric value between 0 and 1. A value of one sets the baseline above the noise, 0.5 in the middle of the noise and 0 below the noise.
`alpha`	The alpha parameter of the high pass filter.
`bfraction`	The percentage of the fragments with the lowest intensities of the filtered signal that are assumed to be baseline signal.
`segments`	The number of segments in which the filtered signal is divided.
`signalWindow`	The window size (number of points) used in the signal windowing step.
`method`	The method used to approximate the baseline. `"linear"` (default) uses linear interpolation. `"spline"` fits a cubic smoothing spline (warning: really slow).

Details

The baseline correction algorithm is based on the work of Chang et al, and it works as follows. For every mass trace, i.e., columns of matrix peaks, the signal intensity is filtered by a first high pass filter:

y_i = \alpha (y_{i-1} + x_i - x_{i-1})

The filtered signal is divided into evenly spaced segments (segments) and the standard deviation of each segment is calculated. A percentage (bfraction) of the segments with the lowest values are assumed to be baseline signal and the standard deviation (\sigma) of the points within those segments is calculated.

Once \sigma has been determined, the points with absolute filtered values larger than 2\sigma are considered signal. After that, the signal windowing step takes every one of the points found to be signal as the center of a signal window (signalWindow) and marks the points within that window as signal. The remaining points are now considered to be noise.

The baseline signal is obtained by either using linear interpolation (default) or fitting a cubic smoothing spline taking only the noise. The baseline can be shifted up or down by using the parameter threshold, which is done by the formula:

B' = B + 4\sigma(t - 0.5)

where B is the fitted spline, \sigma the standard deviation of the noise, and t is the threshold between 0 and 1. Finally, the corrected signal is calculated by subtracting B' to the original signal.

Value

The output depends on whether the input peaks is a matrix or a list. If it is a matrix, then the function returns a matrix of the same dimensions with the baseline corrected intensities. If instead peaks is a list, then the element called "Peaks" will hold the output.

Note

This function is intended to be run internally, but it is exported for advanced users.

Author(s)

Alvaro Cuadros-Inostroza

References

David Chang, Cory D. Banack and Sirish L. Shah, Robust baseline correction algorithm for signal dense NMR spectra. Journal of Magnetic Resonance 187 (2007) 288-292

Examples

  # get a random sample CDF from TargetSearchData
  require(TargetSearchData)
  cdffile <- sample(tsd_cdffiles(), 1)
  pdata <- peakCDFextraction(cdffile)

  # restrict mass range to reduce computing time (not needed for
  # actual data)
  pdata$Peaks <- pdata$Peaks[, 1:10] ; pdata$massRange <- c(85, 94)

  # make a fake baseline as constant + noise (the CDF files have been
  # already baseline corrected by the vendor software).
  nscans <- length(pdata$Time)
  noise <- as.integer(1000 + rnorm(nscans, sd=5))
  pdata$Peaks <- pdata$Peaks + noise

  # change parameters and see how the results change
  pdata1 <- baselineCorrection(pdata)
  pdata2 <- baselineCorrection(pdata, threshold = 1, alpha = 0.97)

  # pick random trace k
  k <- 6
  m <- cbind(pdata$Peaks[, k] - noise, pdata1$Peaks[, k], pdata2$Peaks[, k])
  matplot(pdata$Time, m, type='l', lty=1, xlab='time', ylab='intensity')
  legend('topleft', c('original', 'base correct 1', 'base correct 2'),
         col=1:3, lty=1, lwd=1)

acinostroza/TargetSearch documentation built on July 5, 2025, 1:19 a.m.