run.cluster.matrix: Identify Equivalent Peaks from Different Subjects

Description Usage Arguments Details Value Note Author(s) References See Also

Description

Takes the file generated by run.lrg.peaks, identifies equivalent peaks in each spectrum, and fills in missing values.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
run.cluster.matrix(pre.align = FALSE, align.method = c("PL",
                   "spline", "affine", "none"), align.fcn = NA,
                   trans.method = c("shiftedlog", "glog", "none"),
                   add.par = 0, subtract.base = FALSE, 
                   lrg.only = TRUE, calc.all.peaks = FALSE, 
                   masses = NA, isotope.dist = 7, 
                   cluster.method = c("ppm", "constant", "usewidth"), 
                   cluster.constant = 10, num.pts = 5, 
                   R2.thresh = 0.98, oneside.min = 1, min.spect = 1,
                   peak.method = c("parabola", "locmaxes"), 
                   bhbysubj = TRUE, covariates, root.dir = ".",
                   base.dir, peak.dir, lrg.dir,
                   lrg.file = "lrg_peaks.RData", overwrite = FALSE,
                   use.par.file = FALSE, par.file = "parameters.RData")

Arguments

pre.align

either FALSE, or a numeric vector of shifts to apply to spectra, or a four-component list (of the form described in the Note section below) to be used before identifying peaks from different spectra

align.method

alignment algorithm for peaks

align.fcn

function (and inverse) to apply to masses before (and after) applying align.method; see below

trans.method

type of transformation to use on spectra before statistical analysis

add.par

additive parameter for "shiftedlog" or "glog" options for trans.method

subtract.base

logical; whether to subtract calculated baseline from spectrum

lrg.only

logical; whether to consider only peaks that have at least one “large” peak; i.e., identified by run.lrg.peaks

calc.all.peaks

logical; whether to calculate all possible peaks or only sufficiently large ones

masses

specific masses to test

isotope.dist

maximum distance for declaring isotopes

cluster.method

method for determining when two peaks from different spectra are the same

cluster.constant

parameter used in running cluster.method

num.pts

number of consecutive points needed for peak fitting

R2.thresh

R^2 value needed for peak fitting

oneside.min

minimum number of points on each side of local maximum for peak fitting

min.spect

minimum number of spectra necessary for peak to be used in run.analysis

peak.method

method for locating peaks

bhbysubj

logical; whether to look for number of large peaks by subject (i.e., combining replicates) or by spectrum

covariates

data frame with rownames given by raw data files with extensions (e.g., “.txt”) stripped; only needed if bhbysubj == TRUE

root.dir

directory for parameters file and raw data

base.dir

directory for baseline files; default is paste(root.dir, "/Baselines", sep = "")

peak.dir

directory for peak location files; default is paste(root.dir, "/All_Peaks", sep = "")

lrg.dir

directory for large peaks file; default is paste(root.dir, "/Large_Peaks", sep = "")

lrg.file

name of file to store large peaks in

overwrite

logical; whether to replace existing files with new ones

use.par.file

logical; if TRUE, then parameters are read from par.file in directory root.dir

par.file

string containing name of parameters file

Details

Reads in information from file created by run.strong.peaks, calculates the cluster matrix, fills in missing values, and overwrites the file named lrg.file in lrg.dir. The resulting file contains variables

amps data frame of amplitudes created by run.strong.peaks
centers data frame of centers created by run.strong.peaks
clust.mat data frame with columns given by samples and rows given by the distinct peaks in the samples
lrg.mat data frame of same size as clust.mat with entries given by TRUE if the peak was large in that spectrum and FALSE otherwise
lrg.peaks the data frame of significant peaks created by run.lrg.peaks
num.lrg number of subjects (or spectra if bhbysubj == TRUE) with a large peak at the corresponding mass

and is ready to be used by run.analysis.

Value

No value returned; the file is simply created.

Note

If use.par.file == TRUE and other parameters are entered into the function call, then the parameters entered in the function call overwrite those read in from the file. Note that this is opposite from the behavior for FTICRMS versions 0.7 and earlier.

align.method, cluster.method, peak.method, and trans.method can be abbreviated.

If align.fcn is not NA, then it should consist of a list with components fcn and inv, each of class function. align.fcn$fcn should take a vector of masses as its argument and return a vector of transformed masses. (Typically, this will be transforming masses to frequencies; see Zhang (2005).) align.fcn$inv should be the inverse function of align.fcn$fcn.

If align.method == "spline", then alignment consists of making the transformed masses of the strong peaks all agree exactly with their means, then shifting the rest of the transformed masses via an interpolation spline generated using interpSpline. If align.method == "PL", then the same is done but interpolation is done piecewise linearly between the strong peaks. If align.method == "leastsq", then the transformed masses of the strong peaks are aligned to their means using a least-squares affine fit for each spectrum. In any of these cases, if there are no strong peaks, align.method is changed to "none" with a warning. If there is exactly one strong peak, then alignment is by a simple shift in each spectrum on the transformed masses. If there are exactly two strong peaks, then the alignment is by a simple affine transformation on the transformed masses in each spectrum. If align.method = "spline" and there are exactly three strong peaks, then alignment is piecewise affine on the transformed masses (i.e., identical to align.method = "PL").

If align.method = "leastsq", it is strongly recommended that you supply a value for align.fcn that makes the data points (approximately) equally-spaced.

Defining a value for min.spect can vastly speed up the run time at the (small) cost of a little flexibility in doing the statistical analysis in run.analysis. For exploratory data analysis, this should probably be left alone, but once the peak criterion has been established, further analyses will go much more quickly with min.spect re-defined. The value can either be an integer, which is interpreted as the number of spectra; or a number between 0 and 1, in which case it is interpreted as a fraction of the total number of spectra. In either case, the values of clust.mat, lrg.mat, and num.lrg saved in lrg.file are only those masses which have at least min.spect large peaks among the spectra.

pre.align = FALSE is used if the spectra have already been aligned by the mass spectroscopists. If it is not FALSE, it can either be a vector of additive shifts to be applied to the spectra, or a list with components targets, actual, and align.method. In the last case, targets is a vector of target masses, and actual is a matrix with length(targets) columns and a row for each spectrum, actual[i,j] being the mass in spectrum i that should be matched exactly to target[j], with NA being a valid entry in actual. The alignment is then done as in the description in the above paragraph, depending on the number of non-missing values in row i).

Suppose cluster.constant = K and we have two peaks in different spectra with masses m[1]<m[2]. If cluster.method == "constant", then the peaks are considered to be the same peak if we have m[2]-m[1] < K. If cluster.method == "ppm", then the peaks are considered to be the same peak if we have m[2]-m[1] < K * m[2] * 1e-6. If cluster.method == "usewidth", then the algorithm uses the observation that log(Width_hat) and log(Center_hat) appear to be linearly related. Tolerances are computed using this relationship.

Author(s)

Don Barkauskas (barkda@wald.ucdavis.edu)

References

Barkauskas, D.A. and D.M. Rocke. (2009a) “A general-purpose baseline estimation algorithm for spectroscopic data”. to appear in Analytica Chimica Acta. doi:10.1016/j.aca.2009.10.043

Barkauskas, D.A. et al. (2009b) “Analysis of MALDI FT-ICR mass spectrometry data: A time series approach”. Analytica Chimica Acta, 648:2, 207–214.

Barkauskas, D.A. et al. (2009c) “Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data”. Bioinformatics, 25:2, 251–257.

Zhang, L.-K. et al. (2005) “Accurate mass measurements by Fourier transform mass spectrometry”. Mass Spectrom Rev, 24:2, 286–309.

See Also

run.lrg.peaks, run.strong.peaks, interpSpline


FTICRMS documentation built on May 1, 2019, 10:53 p.m.