readSpectrum: Reads spectral data from a raw file.
In fgcz/rawR: Direct Access to Orbitrap Data and Beyond

readSpectrum

R Documentation

Reads spectral data from a raw file.

Description

The function derives spectra of a given raw file and a given vector of scan numbers.

Usage

readSpectrum(
  rawfile,
  scan = NULL,
  tmpdir = tempdir(),
  validate = FALSE,
  mode = ""
)

Arguments

`rawfile`	the name of the raw file containing the mass spectrometry data from the Thermo Fisher Scientific instrument.
`scan`	a vector of requested scan numbers.
`tmpdir`	defines the directory used to store temporary data generated by the .NET assembly `rawrr.exe`. The default uses the output of `tempdir()`.
`validate`	boolean default is `FALSE`.
`mode`	if `mode = "barebone"` only mZ (centroidStream.Masses), intensity (centroidStream.Intensities), pepmass, StartTime and charge state is returned. As default mode is `""`.

Details

All mass spectra are recorded by scanning detectors (mass analyzers) that log signal intensities for ranges of mass to charge ratios (m/z), also referred to as position. These recordings can be of continuous nature, so-called profile data (p), or appear centroided (c) in case discrete information (tuples of position and intensity values) are sufficient. This heavily compacted data structure is often called a peak list. In addition to signal intensities, a peak list can also cover additional peak attributes like peak resolution (R), charge (z), or local noise estimates. In short, the additional attributes further described the nature of the original profile signal or help to group peak lists with respect to their molecular nature or processing history. A well-known example is the assignment of peaks to peak groups that constitute isotope patterns (M, M+1, M+2, ...). The names of objects encapsulated within rawrrSpectrum instances are keys returned by the Thermo Fisher Scientific New RawFileReader API and the corresponding values become data parts of the objects, typically vectors.

Value

a nested list of rawrrSpectrum objects containing more than 50 values of scan information, e.g., the charge state, two vectors containing the mZ and its corresponding intensity values or the AGC information, mass calibration, ion optics ...

Author(s)

Tobias Kockmann and Christian Panse <cp@fgz.ethz.ch> 2018, 2019, 2020, 2021

References

C# code snippets of the NewRawfileReader library https://planetorbitrap.com/rawfilereader.
rawrr: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1021/acs.jproteome.0c00866")}
Universal Spectrum Explorer: https://www.proteomicsdb.org/use/ \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1021/acs.jproteome.1c00096")}

Examples


# Example 1
S <- rawrr::sampleFilePath() |> rawrr::readSpectrum(scan = 1:9)

S[[1]]

names(S[[1]])

plot(S[[1]])



# Example 2 - find best peptide spectrum match using the |> pipe operator
# fetch via ExperimentHub

if (require(ExperimentHub) & require(protViz)){
eh <- ExperimentHub::ExperimentHub()
EH4547 <- normalizePath(eh[["EH4547"]])

(rawfile <- paste0(EH4547, ".raw"))
if (!file.exists(rawfile)){
    file.link(EH4547, rawfile)
}

GAG <- "GAGSSEPVTGLDAK"

.bestPeptideSpectrumMatch <- function(rawfile,
    sequence="GAGSSEPVTGLDAK"){
    readIndex(rawfile) |>
        subset(abs((1.008 + (protViz::parentIonMass(sequence) - 1.008) / 2) -
            precursorMass) < 0.001, select = scan) |>
        unlist() |>
        readSpectrum(rawfile = rawfile) |>
        lapply(function(x) {
          y <- protViz::psm(sequence = GAG, spec=x, plot=FALSE);
          y$scan <- x$scan; y
        }) |>
        lapply(FUN= function(x){
          score <- sum(abs(x$mZ.Da.error) < 0.01);
          cbind(scan=x$scan, score=score)
        }) |>
        (function(x) as.data.frame(Reduce(rbind, x)))() |>
        subset(score > 0) |>
        (function(x) x[order(x$score, decreasing = TRUE),
            'scan'])() |>
        head(1)
}

start_time <- Sys.time()
bestMatch <- .bestPeptideSpectrumMatch(rawfile, GAG) |>
    rawrr::readSpectrum(rawfile=rawfile) |>
    lapply(function(x) protViz::peakplot(peptideSequence = GAG, x))

end_time <- Sys.time()
end_time - start_time

# Example 3
# using proteomicsdb \doi{10.1101/2020.09.08.287557}
# through https://www.proteomicsdb.org/use/

.UniversalSpectrumExplorer <- function(x, sequence){
    m <- protViz::psm( sequence, x)
    cat(paste(x$mZ[m$idx], "\t", x$intensity[m$idx]), sep = "\n")
}

rawrr::readSpectrum(rawfile=rawfile, 11091) |>
   lapply(function(x).UniversalSpectrumExplorer(x, sequence = GAG))
 }

fgcz/rawR documentation built on July 17, 2025, 1:02 a.m.