run.analysis: Test for Significant Peaks in FT-ICR MS by Controlling FDR

Description Usage Arguments Details Value Note Author(s) References See Also

Description

Takes the file generated by run.cluster.matrix and tests the peaks using Benjamini-Hochberg to control the False Discovery Rate.

Usage

1
2
3
4
5
6
7
8
9
run.analysis(form, covariates, FDR = 0.1, norm.post.repl = FALSE, 
             norm.peaks = c("common", "all", "none"), normalization, 
             add.norm = TRUE,  repl.method = "max", use.model = "lm",
             pval.fcn = "default", lrg.only = TRUE, masses = NA,
             isotope.dist = 7, root.dir = ".", lrg.dir,
             lrg.file = lrg_peaks.RData, res.dir,
             res.file = "analyzed.RData", overwrite = FALSE,
             use.par.file = FALSE, par.file = "parameters.RData",
             bhbysubj = TRUE, subs, ...)

Arguments

form

object of class “formula” to be used by use.model for testing using covariates

covariates

data frame containing covariates used in analysis

FDR

False Discovery Rate in Benjamini-Hochberg test

norm.post.repl

logical; whether to normalize after combining replicates

norm.peaks

which peaks to use in normalization

normalization

type of normalization to use on spectra before statistical analysis; kept for compatibility (see below)

add.norm

logical; whether to normalize additively or multiplicatively on the log scale

repl.method

function or string representing the name of a function; how to deal with replicates

use.model

function or string representing the name of a function; what test to apply to data

pval.fcn

function to extract p-values; default is overall p-value of test

lrg.only

logical; whether to consider only peaks that have at least one “large” peak; i.e., identified by run.lrg.peaks

masses

specific masses to test

isotope.dist

maximum distance for declaring isotopes

root.dir

directory for parameters file and raw data

lrg.dir

directory for large peaks file; default is paste(root.dir, "/Large_Peaks", sep = "")

lrg.file

name of file to store large peaks in

res.dir

directory for results file; default is paste(root.dir, "/Results", sep = "")

res.file

name for results file

overwrite

logical; whether to replace existing files with new ones

use.par.file

logical; if TRUE, then parameters are read from par.file in directory root.dir

par.file

string containing name of parameters file

bhbysubj

logical; whether to look for number of large peaks by subject (i.e., combining replicates) or by spectrum

subs

subset of spectra to use for analysis; see below

...

additional parameters to be passed to use.model

Details

Reads in information from file created by run.cluster.matrix and creates a file named res.file in directory res.dir which contains the following variables:

amps matrix of transformed amplitudes of alignment peaks
bysubjvar a vector which tells which rows of covariates are identified as the same subject
centers matrix of calculated masses of alignment peaks
clust.mat matrix of transformed amplitudes of peaks used in statistical testing
min.FDR FDR level required to get at least one significant test given the starting set of peaks
sigs matrix containing all tests which are significant under at least one scenario
which.sig matrix containing all peaks tested
parameter.list if use.par.file = TRUE, a list generated by extract.pars; otherwise not defined

Value

No value returned; the file is simply created.

Note

If use.par.file == TRUE and other parameters are entered into the function call, then the parameters entered in the function call overwrite those read in from the file. Note that this is opposite from the behavior for FTICRMS versions 0.7 and earlier.

norm.peaks determines the peaks used for normalization: "common" normalizes each spectrum using the average peak height of the alignment peaks from that spectrum in amps; "all" normalizes each spectrum using the average peak height of all peaks in that spectrum.

normalization is obsolete but is included for compatibility with previous versions of the package. The valid normalization schemes translate to the new scheme as follows: "common" is norm.post.repl = FALSE and norm.peaks = "common"; "postbase" is norm.post.repl = FALSE and norm.peaks = "all"; "postrepl" is norm.post.repl = TRUE and norm.peaks = "all"; and "none" is norm.peaks = "none" (and norm.post.repl = FALSE, although this value is irrelevant).

Replicates for the same subject are assumed to be determined by the unique values of covariates$subj. (Future implementations will allow for other methods of defining this.) To analyze replicates as independent samples, use repl.method = "none". This will also speed up the run time if there are no replicates in the data set.

The argument subs can be logical or numeric or character; if it is defined, then covariates is modified to covariates[subs,,drop=F].

If masses is not NULL, then the listed masses plus anything that could be in the first isotope.dist - 1 isotope peaks of each mass are tested.

If something other than the p-value for the overall test statistic is needed, then the user-defined function for pval.fcn should have the form pval.fcn = function(x){...}, where x is a model object of the type returned by use.model; and should have a return value of the desired p-value.

If use.model evaluates to t.test, then the difference between the two groups for each peak is recorded in which.sig$Delta and sigs$Delta; otherwise, these columns consist entirely of NA entries.

Each rowname of sigs and which.sig represents the range of masses that were used to form that peak. The columns of those objects give the p-value of the peaks in each row, the number of samples that had large peaks for each row, and the significance of each test, coded as

NA peak not eligible for B-H
0 peak eligible for B-H but not declared significant
1 peak declared significant

The “S” labels refer to the number of large peaks that were necessary for a row to be eligible. For example, the column labeled S5 in sigs used as its starting set of p-values all rows which had which.sig$num.lrg >= 5. If bhbysubj == TRUE, then the entries of num.lrg are obtained by going subject-by-subject and for each mass counting the number of subjects who had at least one spectrum with a large peak at that mass; otherwise, num.lrg for each mass is simply the total number of spectra that had a large peak at that mass.

Author(s)

Don Barkauskas (barkda@wald.ucdavis.edu)

References

Barkauskas, D.A. and D.M. Rocke. (2009a) “A general-purpose baseline estimation algorithm for spectroscopic data”. to appear in Analytica Chimica Acta. doi:10.1016/j.aca.2009.10.043

Barkauskas, D.A. et al. (2009b) “Analysis of MALDI FT-ICR mass spectrometry data: A time series approach”. Analytica Chimica Acta, 648:2, 207–214.

Barkauskas, D.A. et al. (2009c) “Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data”. Bioinformatics, 25:2, 251–257.

Benjamini, Y. and Hochberg, Y. (1995) “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” J. Roy. Statist. Soc. Ser. B, 57:1, 289–300.

See Also

run.strong.peaks


FTICRMS documentation built on May 1, 2019, 10:53 p.m.