baseline: Calculate Baselines for Spectroscopic Data

Description Usage Arguments Details Value Note Author(s) References See Also

Description

Computes an estimated baseline curve for a spectrum using the “BXR algorithm,” a method of Xi and Rocke generalized by Barkauskas and Rocke.

Usage

1
2
3
4
baseline(spect, init.bd, sm.par = 1e-11, sm.ord = 2, max.iter = 20, tol = 5e-8,
         sm.div = NA, sm.norm.by = c("baseline", "overestimate", "constant"),
         neg.div = NA, neg.norm.by = c("baseline", "overestimate", "constant"),
         rel.conv.crit = TRUE, zero.rm = TRUE, halve.search = FALSE)

Arguments

spect

vector containing the intensities of the spectrum

init.bd

initial value for baseline; default is flat baseline at median height

sm.par

smoothing parameter for baseline calculation

sm.ord

order of derivative to penalize in baseline analysis

max.iter

convergence criterion in baseline calculation

tol

convergence criterion; see below

sm.div

smoothness divisor in baseline calculation

sm.norm.by

method for smoothness penalty in baseline analysis

neg.div

negativity divisor in baseline calculation

neg.norm.by

method for negativity penalty in baseline analysis

rel.conv.crit

logical; whether convergence criterion should be relative to size of current baseline estimate

zero.rm

logical; whether to replace zeros with average of surrounding values

halve.search

logical; whether to use a halving-line search if step leads to smaller value of function

Details

If the spectrum is given by y[i], then the algorithm works by maximizing the objective function

F({b[i]}) = sum_{i=1}^{n}b[i] - sum_{i=2}^{n-1}A[1,i]*(b[i-1]-2b[i]+b[i+1])^2 - ∑_{i=1}^n A[2,i]*[max{b[i]-y[i],0}]^2

using Newton's method (with embedded halving line search if halve.search == TRUE) using starting value b[i] = init.bd[i] for all i. The middle term controls the smoothness of the baseline and the last term applies a “negativity penalty” when the baseline is above the spectrum.

The smoothing factor sm.par corresponds to A[1]^{*} in Barkauskas (2009) and controls how large the estimated nth derivative of the baseline is allowed to be (for sm.ord = n). From a practical standpoint, values of sm.ord larger than two do not seem to adequately smooth the baseline because the Hessian becomes computationally singular for any reasonable value of sm.par.

The parameters sm.div, sm.norm.by, neg.div, and neg.norm.by determine the methods used to normalize the smoothness and negativity terms. The general forms are A[1,i] = n^4 * A[1]^{*}/M[i]/p and A[2,i] = 1/M[i]/p. Here, n = length(spect); p is sm.div or neg.div, as appropriate; and M[i] is determined by sm.norm.by or neg.norm.by, as appropriate. Values of "baseline" make M[i] = b[i]', where b[i]' is the currently estimated value of the baseline; values of "overestimate" make M[i] = b[i]'-y[i]; and values of "constant" make M[i] = σ, where σ is an estimate of the noise standard deviation.

The values of sm.norm.by and neg.norm.by can be abbreviated and both have default value "baseline". The default values of NA for sm.div and neg.div are translated by default to sm.div = 0.5223145 and neg.div = 0.4210109, which are the appropriate parameters for the FT-ICR mass spectrometry machine that generated the spectra which were used to develop this package. It is distinctly possible that other machines will require different parameters, and almost certain that other spectroscopic technologies will require different parameters; see Barkauskas (2009a) for a description for how these parameters were obtained.

If zero.rm == TRUE and y[a],…,y[a+k] = 0, then these values of the spectrum are set to be (y[a-1]+y[a+k+1])/2. (For typical MALDI FT-ICR spectra, a spectrum value of zero indicates an erased harmonic and should not be considered a real data point.)

Value

A list containing the following items:

baseline

The computed baseline

iter

The number of iterations for convergence

changed

Numeric vector of length iter containing the number of indicator variables that switched value on each iteration

hs

Numeric vector of length iter containing the number of halving line-searches done on each iteration

Note

The original algorithm was developed by Yuanxin Xi and David Rocke. The code in this package was first adapted from a Matlab program by Yuanxin Xi, then modified to account for the new methodology in Barkauskas (2009a).

halve.search = FALSE is recommended unless both sm.norm.by == "constant" and neg.norm.by == "constant".

Author(s)

Don Barkauskas (barkda@wald.ucdavis.edu)

References

Barkauskas, D.A. and D.M. Rocke. (2009a) “A general-purpose baseline estimation algorithm for spectroscopic data”. to appear in Analytica Chimica Acta. doi:10.1016/j.aca.2009.10.043

Barkauskas, D.A. et al. (2009b) “Analysis of MALDI FT-ICR mass spectrometry data: A time series approach”. Analytica Chimica Acta, 648:2, 207–214.

Barkauskas, D.A. et al. (2009c) “Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data”. Bioinformatics, 25:2, 251–257.

Xi, Y. and Rocke, D.M. (2008) “Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis”. BMC Bioinformatics, 9:324.

See Also

run.baselines


FTICRMS documentation built on May 1, 2019, 10:53 p.m.