library(Risa)
library(xcms)

Introduction

Tomato

R and ISAtab

An ISAtab archive will contain the metadata description in several tab-separated files. (One of) the assay files contains the column Raw Spectral Data File with the paths to the mass spectral raw data files in one of the above formats.

ISAmtbls297 <- readISAtab(find.package("mtbls297"))
assay <- ISAmtbls297@assay.tabs[[1]]
msfiles <- paste(find.package("mtbls297"), "mzML",
                 assay@assay.file$"Derived Spectral Data File",
                 sep="/")

ISAtab, Risa and xcms

With the combination of Risa and xcms, we can process the MS raw data:

cwp <- CentWaveParam(ppm = 25, peakwidth = c(10, 20), snthresh = 10,
  prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L,
  mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE,
  roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric())

raw_data <- readMSData(msfiles, mode = "onDisk")

## Perform the peak detection using the settings defined above.
mtbls297 <- findChromPeaks(raw_data, param = cwp, BPPARAM = MulticoreParam())

The result is the same type of xcmsSet object:

x2 <- findChromPeaksIsolationWindow(mtbls297, 
                                    param = cwp, 
                                    BPPARAM = MulticoreParam())
cpd <- chromPeakData(x2)

Several options exist to quantify the individual intensities. For each feature, additional attributes are available, such as the minimum/maximum and average retention time and m/z values.

Grouping and Retention time correction

In the following steps, we perform a grouping: because the UPLC system used here has very stable retention times, we just use the retention time correction step as quality control of the raw data. After that, 'fillPeaks()' will integrate the raw data for those features, which were not detected in some of the samples.

mtbls2Set <- group(mtbls2Set, minfrac=1, bw=3)
retcor(mtbls2Set, plottype="mdevden")

QC on peaks picked

A first QC step is the visual inspection of intensities across the samples. Alternatively to a boxplot, one could also create histograms/density plots.

boxplot(groupval(mtbls2Set, value="into")+1, 
        col=as.numeric(sampclass(mtbls2Set))+1, 
        log="y", las=2)

Data imputation

After grouping, peaks might be missing/not found in some samples. fillPekas() will impute them, using the consensus mz and RT from the other samples.

mtbls2Set <- fillPeaks(mtbls2Set, nSlaves=nSlaves)

The final xcmsSet represents a rectangular matrix of mass spectral features, which were detected (or imputed) across the samples. The dimensionality is M * N, where M denotes the number of samples in the assay, and N the number of features grouped across the samples.

QC with some diagnostic plots

QC of mass accuracy and retention time consistency

plotQC(mtbls2Set)

QC with PCA

In addition to the boxplot for QC, we can also check a hierarchical clustering and the PCA of the samples.

sdThresh <- 4.0 ## Filter low-standard deviation rows for plot
data <- log(groupval(mtbls2Set, value="into")+1)

pca.result <- pca(data, nPcs=3)
plotPcs(pca.result, type="loadings", 
        col=as.numeric(sampclass(mtbls2Set))+1)

Annotated diffreport

an <- xsAnnotate(mtbls2Set,
                 sample=seq(1,length(sampnames(mtbls2Set))),
                 nSlaves=nSlaves)

an <- groupFWHM(an)
an <- findIsotopes(an)  # optional but recommended.
an <- groupCorr(an,
                graphMethod="lpc",
                calcIso = TRUE,
                calcCiS = TRUE,
                calcCaS = TRUE,
                cor_eic_th=0.5)

## Setup ruleSet
rs <- new("ruleSet")

rs@ionlistfile <- file.path(find.package("mtbls2"), "lists","ions.csv")
rs@neutraladditionfile <- file.path(find.package("mtbls2"), "lists","neutraladdition.csv")
rs@neutrallossfile <- file.path(find.package("mtbls2"), "lists","neutralloss.csv")

rs <- readLists(rs)
rs <- setDefaultParams(rs)
rs <- generateRules(rs)

an <- findAdducts(an,
                  rules=rs@rules,
                  polarity="positive")

Diffreport

dr <- diffreport(mtbls2Set, sortpval=FALSE, filebase="mtbls2diffreport", eicmax=20 )
cspl <- getPeaklist(an)

annotatedDiffreport <- cbind(dr, cspl)

Combine diffreport and CAMERA spectra

interestingPspec <- tapply(seq(1, nrow(annotatedDiffreport)),
                               INDEX=annotatedDiffreport[,"pcgroup"],
                               FUN=function(x, a) {m <- median(annotatedDiffreport[x, "pvalue"]);
                                                   p <- max(annotatedDiffreport[x, "pcgroup"]);
                                                   as.numeric(c(pvalue=m,pcgroup=p))},
                               annotatedDiffreport)

interestingPspec <- do.call(rbind, interestingPspec)
colnames(interestingPspec) <- c("pvalue", "pcgroup") 

o <- order(interestingPspec[,"pvalue"])

pdf("interestingPspec.pdf")
dummy <- lapply(interestingPspec[o[1:40], "pcgroup"],
                function(x) {suppressWarnings(plotPsSpectrum(an, pspec=x, maxlabel=5))})
dev.off()

R, ISAtab, xcms and CAMERA revisited

These attributes and the intensity matrix could already be exported to conform to the specification for the ``metabolite assignment file'' in the mzTab format used in MetaboLights. Currently, this functionality is not included in xcms. A prototype snippet is the following:

``` {r assembleMAF}

pl <- annotatedDiffreport

charge <- sapply(an@isotopes, function(x) { ifelse( length(x) > 0, x$charge, NA) }) abundance <- groupval(an@xcmsSet, value="into")

load ISA assay files

a.samples <- ISAmtbls2["samples.per.assay.filename"][[ a.filename ]]

These columns are defined by mzTab

maf.std.colnames <- c("identifier", "chemical_formula", "description", "mass_to_charge", "fragmentation", "charge", "retention_time", "taxid", "species", "database", "database_version", "reliability", "uri", "search_engine", "search_engine_score", "modifications", "smallmolecule_abundance_sub", "smallmolecule_abundance_stdev_sub", "smallmolecule_abundance_std_error_sub")

Plus the columns for the sample intensities

all.colnames <- c(maf.std.colnames, a.samples)

Now assemble new maf

l <- nrow(pl)

maf <- data.frame(identifier = character(l), chemical_formula = character(l), description = character(l), mass_to_charge = pl$mz, fragmentation = character(l), charge = charge, retention_time = pl$rt, taxid = character(l), species = character(l), database = character(l), database_version = character(l), reliability = character(l), uri = character(l), search_engine = character(l), search_engine_score = character(l), modifications = character(l), smallmolecule_abundance_sub = character(l), smallmolecule_abundance_stdev_sub = character(l), smallmolecule_abundance_std_error_sub = character(l), abundance, stringsAsFactors=FALSE)

```r

##
## Make sure maf table is quoted properly, 
## and add to the ISAmtbls2 assay file.
## 
maf_character <- apply(maf, 2, as.character)

write.table(maf_character, 
            file=paste(tempdir(), "/a_mtbl2_metabolite profiling_mass spectrometry_maf.csv", sep=""),
            row.names=FALSE, col.names=all.colnames, 
            quote=TRUE, sep="\t", na="\"\"")

ISAmtbls2 <- updateAssayMetadata(ISAmtbls2, a.filename,
             "Metabolite Assignment File",
         paste(tempdir(), "/a_mtbl2_metabolite profiling_mass spectrometry_maf.csv", sep=""))

write.assay.file(ISAmtbls2, a.filename)
sessionInfo()


sneumann/mtbls297 documentation built on July 14, 2020, 12:43 a.m.