simExTargId: simExTargId

Description Usage Arguments Details See Also Examples

View source: R/simExTargId.R

Description

This function seeks to remove the gap between metabolomic MS1 profiling experiments and discovery of statistically relevant targets for MSn fragmentation based identification.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
simExTargId(rawDir = NULL, studyName = "exampleStudyName",
  analysisDir = NULL, coVar = NULL, nCores = NULL, ionMode = NULL,
  metab = NULL, minFiles = 10, centroid = TRUE, mzXml = TRUE,
  zeroFillValue = NULL, normMethod = NULL, manBatchAdj = NULL,
  LogTransBase = exp(1), smoothSpan = 0.8, cvThresh = 30, blankFC = 2,
  replicates = FALSE, pAdjustMethod = "BH", xcmsSetArgs = list(peakwidth =
  c(5, 20), ppm = 10, snthresh = 5, method = "centWave"), bw = 2,
  mzwid = 0.015, minfrac = 0.25, pcaOutIdArgs = list(cv = "q2", scale =
  "pareto", centre = TRUE), peakMonitorArgs = list(ppm = 10, rtdev = 30,
  maxSignalAtt = 20, percBelow = 20), maxTime = 60, emailAddress = NULL,
  mailControl = list(smtpServer = "ASPMX.L.GOOGLE.COM"), minFeatPseudo = 2)

Arguments

rawDir

character full path name of raw data directory into which raw data will eventually/ has already been started to be written. An "_analysis" directory will be automatically created into which all converted files (.mzXML) and results output will be saved. Fifteen minutes after the last modification of a new raw data file a conversion to the mzXML format and initial peak-picking will occur (xcmsSet).

analysisDir

full path name of analysis directory, all mzXML, peak-picking and results files will be saved here.

coVar

covariate table can be supplied as either a full path name to a .csv file (see details) or as a data.frame.

nCores

numeric number of computer cores for parallel computation.

ionMode

character ionization polarity must be specific (default= NULL). Must be be either "negative" or "positive" or an abbreviation starting from "neg" or "pos".

metab

character or data.frame. Full path to a csv file of a list of metabolites you wish to monitor (or a data.frame). See peakMonitor for more details.

minFiles

the minimum number of raw data file converted to mzXML files before commencing subsequent steps of XCMS processing, pre-processing, PCA and statistical analysis. Default = 10.

centroid

do the raw data files need to be centroided during conversion by MSConvert http://proteowizard.sourceforge.net/downloads.shtml. NB. centroiding of data is necessary for utilization of the "centWave" algorithm of xcms (findPeaks.centWave-methods).

mzXml

logical should raw LC-MS data files converted by MSConvert to mzXml or mzML file formats (default = TRUE i.e. files are converted to the mzXml format).

zeroFillValue

value to fill missing and zero values in the xcms peak table. If argument is left NULL the default is to fill with half the smallest non-zero value. (see: zeroFill).

normMethod

normalization method to perform (see: signNorm). If argument is left blank no normalization is performed. Options include median fold change (probabilistic quotient) 'medFC' or total ion signal 'totIon' methods.

manBatchAdj

character vector of co-variate table column names. If argument is not supplied automatic cluster identification/ batch adjustment using the pcaClustId function will occur prior to PCA analysis. If multiple column names are supplied a multiple linear regression will be calculated and the batch adjusted residuals obtained from the model.

LogTransBase

numeric base value for log transformation. defaults to exponential of 1 (see: preProc).

smoothSpan

see preProc function of MetMSLine for details (default = 0.8).

cvThresh

see preProc function of MetMSLine for details (default = 30 i.e. 30% coefficient of variation).

blankFC

numeric minimum fold difference for each xcms peak table feature between the sample injections (numerator) and negative control blank injections (denominator). This will result in all xcms peak table features which are less than blankFC fold change higher in the samples than the blanks being removed (default = 2 fold). This background substraction will not occur until there are at least 1 blank and 1 sample data file acquired and converted to mzXML files.

replicates

logical (default = FALSE) if TRUE then the 3rd column of the co-variate table supplied will be used to identify analytical/ preparative replicates of the same sample. This information will be used to average signal intensities of analytical replicates.

pAdjustMethod

character p-value multiple testing adjustment method (see: p.adjust).

xcmsSetArgs

list of arguments of the xcms function xcmsSet.

pcaOutIdArgs

list of arguments to the pcaOutId function.

maxTime

maximum time (in minutes) from the time the last raw data file was written. If most recent raw data file is older than this time then simExTargId will stop. This is designed to stop the process if necessary after an extended period of time. default = 60 mins.

emailAddress

character vector of email address(es) from and to which to send warning email that run may have stopped, QCs are outlying or signal has attenuated. (if not supplied then email notifications will not be sent) see sendmail.

mailControl

List of SMTP server settings see sendmail for details. Example given is for google mail.

minFeatPseudo

integer the minimum number of features for a CAMERA pseudospectrum to be considered (default =2 i.e. a CAMERA pseudospectrum must consist of a minimum of two LC-MS peak groups). A weighted mean (weighted by the summed peak area of all samples) will be calculated for each pseudospectrum group. This removal of artefactual LC-MS features is performed to reduce the multiple testing burden prior to statistical analysis.

Details

The function is designed to faciliate simultaneous collection of raw metabolomic profiling data, conversion from an instrument manufacturer's proprietary format to the open file format mzXML conversion (using the command line version of MSConvert) to a new analysis directory, xcms based peak picking/ alignment, data pre-processing (zeroFill, logTrans, signNorm), automatic PCA-based outlier removal, scores plot cluster identification and automatic potential batch adjustment (pcaOutId, pcaClustId, batchAdj), automatic co-variate based statistical analysis and feature deconvolution (coVarStatType, rtCorrClust). The data output of the workflow can be visualized by a shiny application (shiny) at any time (targetId) and MS2 fragmentation targets rapidly identified.

The function works according to the following process:

1. The function can be run before the collection of the first raw data file or after collection of any number of raw MS data files.

2. The directory location where the raw data are being written to must be provided (rawDir), as well as a comma delimited text file (.csv) containing at minimum 3 columns namely.

  1. The first column must contain the precise names of each raw MS data file that will be eventually created in the raw data directory.

  2. The second column of the .csv file must contain the sample class of each raw MS data file. There are 3 suggested injection/run type names that should be included in this column for Good Laboratory Practice (GLP) and ideal metabolomic experimental design. The only mandatory injection/run type name is "sample" (not case-sensitive), simExTargId will stop if this is not found in the second column.

    This will direct simExTargId to perform statistical analysis on samples only for which co-variates are available in columns 3 - total number of columns. In order to use the full functionality of simExTargId it is also suggested that "local" equivolume pooled quality control samples are included in the 2nd column and must be named as "QC" (not case-sensitive). If these injection types are detected then many additional processing functions and monitoring methods will be accessed. These QC samples can be monitored for signal attenuation, used to smooth the data (see loessSmooth), filter based on minimum CV% (see cvCalc) and used to identify if the last QC injected is outlying in the automatic principle components analysis (see pcaOutId). In the case of signal attenuation (see peakMonitor) and a QC sample being detected as an outlier an email will be sent to the email address(es) supplied.

    In order to distinguish a column conditioning QC from a regular QC sample the prefix "cc" should be added to the name "QC". If a QC is not used for column conditioning then just "cc" (not case-sensitive) is sufficient. These will not be monitored by the peakMonitor of pcaOutId functions.

    Finally a blank sample should be denoted as "blank" (also not case-sensitive). If these are added to the coVariate table second column then blank subtraction will also be performed (see blankSub). Any additional injection/run types are allowed e.g. "gQC" (global QC) or "MS2" for example. If the file is denoted as "MS2" then these files will be converted differently to normal MS1 profiling data by MSConvert.

    N.B. Raw data files such as blank and quality control for example which are written to the raw data directory during an experiment will also be included in the xcms peak-picking and alignment process but will of course not be considered in the subsequent statistical analysis.

  3. columns 3 + all subsequent columns can contain a mixture of co-variates these can be any combination of continuous or categorical variable associated with the "sample" injections/runs specified in column 2. The function coVarTypeStat will select an appropriate univariate statistical method type to use based on this contents of these columns at any stage of the profiling run. For example if 3 categorical classes of minimum size each are identified then an ANOVA analysis will be performed or if a variable is found to be continuous then a correlation analysis will be performed.

See Also

xcmsSet, retcor, group, fillPeaks, diffreport,

Examples

1
2
# example XCMS peak picking/ grouping/ alignment settings for high-resolution LC-MS data (Q-ToF).
# peakwidth=c(2, 20), ppm=10, snthresh=5, bw=2, mzwid=0.015, minfrac=0.25, method="centWave"

WMBEdmands/simExTargId documentation built on May 24, 2019, 2:08 a.m.