minifyMzxml: Shrink mzXML files by including only data points near masses...

View source: R/minifyMSFunctions.R

minifyMzxmlR Documentation

Shrink mzXML files by including only data points near masses of interest

Description

mzXML files can be annoyingly large if only a few masses are of interest. This large size makes it difficult to share them online for debugging purposes and often means that untargeted algorithms spend a lot of time picking peaks in data that's irrelevant. minifyMzxml is a function designed to "minify" mzXML files by extracting only those data points that are within a ppm error of an m/z value of interest, and returns the file essentially otherwise unchanged. This function currently works only on MS1 data, but is reasonably expandable if demand becomes evident.

Usage

minifyMzxml(
  filename,
  output_filename,
  ppm,
  mz_exclude = NULL,
  mz_include = NULL,
  prefilter = -1,
  warn = TRUE
)

Arguments

filename

The name of a single file to be minified, usually produced by Proteowizard's 'msconvert' or something similar.

output_filename

The name of the file to be written out.

ppm

The parts-per-million error of the instrument used to collect the original file.

mz_exclude

A vector of m/z values that should be excluded from the minified file. This argument must be used with the 'ppm' argument and should not be used with mz_include. For each mass provided, an m/z window of +/- 'ppm' is calculated, and all data points within that window are removed.

mz_include

A vector of m/z values that should be included in the minified file. This argument must be used with the 'ppm' argument and should not be used with mz_exclude. For each mass provided, an m/z window of +/- 'ppm' is calculated, and all data points within that window are kept.

prefilter

A single number corresponding to the minimum intensity of interest in the MS1 data. Data points with intensities below this threshold will be silently dropped, which can dramatically reduce the size of the final object. Currently only works with MS1 data, but could be expanded easily to handle more.

warn

Boolean. Should the function warn the user when removing an index from an mzML file?

Value

Invisibly, the name of the new file.

Examples

## Not run: 
library(RaMS)
# Extract data corresponding to only valine and homarine
# m/z = 118.0865 and 138.0555, respectively
filename <- system.file("extdata", "LB12HL_AB.mzXML.gz", package = "RaMS")
output_filename <- "mini_LB12HL_AB.mzXML"
include_mzs <- c(118.0865, 138.0555)
minifyMzxml(filename, output_filename, mz_include=include_mzs, ppm=5)
unlink(output_filename)

# Exclude data corresponding to valine and homarine
filename <- system.file("extdata", "LB12HL_AB.mzXML.gz", package = "RaMS")
output_filename <- "mini_LB12HL_AB.mzXML"
exclude_mzs <- c(118.0865, 138.0555)
minifyMzxml(filename, output_filename, mz_exclude=exclude_mzs, ppm=5)
unlink(output_filename)

## End(Not run)

RaMS documentation built on Oct. 9, 2024, 9:06 a.m.