run.analysis: Test for Significant Peaks in FT-ICR MS by Controlling FDR
In FTICRMS: Programs for Analyzing Fourier Transform-Ion Cyclotron Resonance Mass Spectrometry Data

Description Usage Arguments Details Value Note Author(s) References See Also

Takes the file generated by run.cluster.matrix and tests the peaks using Benjamini-Hochberg to control the False Discovery Rate.

run.analysis(form, covariates, FDR = 0.1, norm.post.repl = FALSE, 
             norm.peaks = c("common", "all", "none"), normalization, 
             add.norm = TRUE,  repl.method = "max", use.model = "lm",
             pval.fcn = "default", lrg.only = TRUE, masses = NA,
             isotope.dist = 7, root.dir = ".", lrg.dir,
             lrg.file = lrg_peaks.RData, res.dir,
             res.file = "analyzed.RData", overwrite = FALSE,
             use.par.file = FALSE, par.file = "parameters.RData",
             bhbysubj = TRUE, subs, ...)

`form`	object of class “`formula`” to be used by `use.model` for testing using `covariates`
`covariates`	data frame containing covariates used in analysis
`FDR`	False Discovery Rate in Benjamini-Hochberg test
`norm.post.repl`	logical; whether to normalize after combining replicates
`norm.peaks`	which peaks to use in normalization
`normalization`	type of normalization to use on spectra before statistical analysis; kept for compatibility (see below)
`add.norm`	logical; whether to normalize additively or multiplicatively on the log scale
`repl.method`	function or string representing the name of a function; how to deal with replicates
`use.model`	function or string representing the name of a function; what test to apply to data
`pval.fcn`	function to extract p-values; default is overall p-value of test
`lrg.only`	logical; whether to consider only peaks that have at least one “large” peak; i.e., identified by `run.lrg.peaks`
`masses`	specific masses to test
`isotope.dist`	maximum distance for declaring isotopes
`root.dir`	directory for parameters file and raw data
`lrg.dir`	directory for large peaks file; default is `paste(root.dir, "/Large_Peaks", sep = "")`
`lrg.file`	name of file to store large peaks in
`res.dir`	directory for results file; default is `paste(root.dir, "/Results", sep = "")`
`res.file`	name for results file
`overwrite`	logical; whether to replace existing files with new ones
`use.par.file`	logical; if `TRUE`, then parameters are read from `par.file` in directory `root.dir`
`par.file`	string containing name of parameters file
`bhbysubj`	logical; whether to look for number of large peaks by subject (i.e., combining replicates) or by spectrum
`subs`	subset of spectra to use for analysis; see below
`...`	additional parameters to be passed to `use.model`

Reads in information from file created by run.cluster.matrix and creates a file named res.file in directory res.dir which contains the following variables:


`amps`	matrix of transformed amplitudes of alignment peaks
`bysubjvar`	a vector which tells which rows of `covariates` are identified as the same subject
`centers`	matrix of calculated masses of alignment peaks
`clust.mat`	matrix of transformed amplitudes of peaks used in statistical testing
`min.FDR`	FDR level required to get at least one significant test given the starting set of peaks
`sigs`	matrix containing all tests which are significant under at least one scenario
`which.sig`	matrix containing all peaks tested
`parameter.list`	if `use.par.file = TRUE`, a list generated by `extract.pars`; otherwise not defined

No value returned; the file is simply created.

If use.par.file == TRUE and other parameters are entered into the function call, then the parameters entered in the function call overwrite those read in from the file. Note that this is opposite from the behavior for FTICRMS versions 0.7 and earlier.

norm.peaks determines the peaks used for normalization: "common" normalizes each spectrum using the average peak height of the alignment peaks from that spectrum in amps; "all" normalizes each spectrum using the average peak height of all peaks in that spectrum.

normalization is obsolete but is included for compatibility with previous versions of the package. The valid normalization schemes translate to the new scheme as follows: "common" is norm.post.repl = FALSE and norm.peaks = "common"; "postbase" is norm.post.repl = FALSE and norm.peaks = "all"; "postrepl" is norm.post.repl = TRUE and norm.peaks = "all"; and "none" is norm.peaks = "none" (and norm.post.repl = FALSE, although this value is irrelevant).

Replicates for the same subject are assumed to be determined by the unique values of covariates$subj. (Future implementations will allow for other methods of defining this.) To analyze replicates as independent samples, use repl.method = "none". This will also speed up the run time if there are no replicates in the data set.

The argument subs can be logical or numeric or character; if it is defined, then covariates is modified to covariates[subs,,drop=F].

If masses is not NULL, then the listed masses plus anything that could be in the first isotope.dist - 1 isotope peaks of each mass are tested.

If something other than the p-value for the overall test statistic is needed, then the user-defined function for pval.fcn should have the form pval.fcn = function(x){...}, where x is a model object of the type returned by use.model; and should have a return value of the desired p-value.

If use.model evaluates to t.test, then the difference between the two groups for each peak is recorded in which.sig$Delta and sigs$Delta; otherwise, these columns consist entirely of NA entries.

Each rowname of sigs and which.sig represents the range of masses that were used to form that peak. The columns of those objects give the p-value of the peaks in each row, the number of samples that had large peaks for each row, and the significance of each test, coded as


`NA`	peak not eligible for B-H
`0`	peak eligible for B-H but not declared significant
`1`	peak declared significant

The “S” labels refer to the number of large peaks that were necessary for a row to be eligible. For example, the column labeled S5 in sigs used as its starting set of p-values all rows which had which.sig$num.lrg >= 5. If bhbysubj == TRUE, then the entries of num.lrg are obtained by going subject-by-subject and for each mass counting the number of subjects who had at least one spectrum with a large peak at that mass; otherwise, num.lrg for each mass is simply the total number of spectra that had a large peak at that mass.

Don Barkauskas (barkda@wald.ucdavis.edu)

Barkauskas, D.A. and D.M. Rocke. (2009a) “A general-purpose baseline estimation algorithm for spectroscopic data”. to appear in Analytica Chimica Acta. doi:10.1016/j.aca.2009.10.043

Barkauskas, D.A. et al. (2009b) “Analysis of MALDI FT-ICR mass spectrometry data: A time series approach”. Analytica Chimica Acta, 648:2, 207–214.

Barkauskas, D.A. et al. (2009c) “Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data”. Bioinformatics, 25:2, 251–257.

Benjamini, Y. and Hochberg, Y. (1995) “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” J. Roy. Statist. Soc. Ser. B, 57:1, 289–300.

run.strong.peaks

FTICRMS documentation built on May 1, 2019, 10:53 p.m.