preprocess_MaxQuant: Preprocess MSnSet objects originating from MaxQuant...
In statOmics/MSqRob: Robust statistical inference for quantitative LC-MS proteomics

preprocess_MaxQuant

R Documentation

Preprocess MSnSet objects originating from MaxQuant peptides.txt files

Description

Performs a standard preprocessing pipeline on MSnSet objects (Gatto et al., 2012) originating from MaxQuant (Cox and Mann, 2008) peptides.txt files. By default, intensity values are log2 transformed and then quantile normalized. Next, the smallestUniqueGroups function is applied, which removes proteins groups for which any of its member proteins is present in a smaller protein group. Then, contaminants and reverse sequences are removed. Next, irrelevant columns are dropped. Then, peptide sequences that are identified only once in a single mass spec run are removed because with only 1 identification, the model will be perfectly confounded. Finally, potential experimental annotations are added to the data frame.

Usage

preprocess_MaxQuant(MSnSet, accession = "Proteins", exp_annotation = NULL,
  type_annot = NULL, logtransform = TRUE, base = 2,
  normalisation = "quantiles", weights = NULL,
  smallestUniqueGroups = TRUE, useful_properties = c("Proteins", "Sequence",
  "PEP"), filter = c("Potential.contaminant", "Reverse"),
  filter_symbol = "+", minIdentified = 2, remove_only_site = FALSE,
  file_proteinGroups = NULL, colClasses = "keep", droplevels = TRUE,
  printProgress = FALSE, shiny = FALSE, message = NULL)

Arguments

`MSnSet`	An `MSnSet` object that contains data originating from MaxQuant's peptides.txt file.
`accession`	A character indicating the column that contains the the protein identifiers. This is only used if `smallestUniqueGroups` is `TRUE` and/or and `external_filter_file` is provided. Thus, the `accession` parameter can safely be specified even when you are not interested in comparing proteins later on. Defaults to "Proteins".
`exp_annotation`	Either the path to the file which contains the experiment annotation or a data frame containing the experiment annotation. Exactly one colum in the experiment annotation should contain the mass spec run names. Annotation in a file can be both a tab-delimited text document or an Excel file. For more details, see `read.table` and `read.xlsx`. As an error protection measurement, leading and trailing spaces in each column are trimmed off. The default, `NULL` indicates there is no annotation to be added.
`type_annot`	If `exp_annotation` is a path to a file, the type of file. `type_annot` is mostly obsolete as supported files will be automatically recognized by their extension. Currently only `"tab-delim"` (tab-delimited file), `"xlsx"` (Office Open XML Spreadsheet file) and `NULL` (file type decided based on the extension) are supported. If the extension is not recognized, the file will be assumed to be a tab-delimited file. Defaults to `NULL`.
`logtransform`	A logical value indicating whether the intensities should be log-transformed. Defaults to `TRUE`.
`base`	A positive or complex number: the base with respect to which logarithms are computed. Defaults to 2.
`normalisation`	A character vector of length one that describes how to normalise the `MSnSet` object. See `normalise` for details. Defaults to `"quantiles"`. If no normalisation is wanted, set `normalisation="none"`.
`weights`	Only used when `normalisation` is set to or "rlr", "loess.fast", "loess.affy" or "loess.pairs". A numeric vector of weights for each row in the MSnSet object to be used for the fitting during the normalisation step. Defaults to `NULL`.
`smallestUniqueGroups`	A logical indicating whether protein groups for which any of its member proteins is present in a smaller protein group should be removed from the dataset. Defaults to `TRUE`.
`useful_properties`	The columns of the `featureData` slot that are useful in the further analysis and/or inspection of the data and should be retained. Defaults to `c("Proteins","Sequence","PEP")`.
`filter`	A vector of names corresponding to the columns in the `featureData` slot of the `MSnSet` object that contain a `filtersymbol` that indicates which rows should be removed from the data. Typical examples are contaminants or reversed sequences. Defaults to `c("Contaminant","Reverse")`. Note that in earlier versions of MaxQuant the "Contaminant" column was called "Potential.contaminant". If "Potential.contaminant" is mentioned in this argument but could not be found, this function automatically tries to filter on "Contaminant".
`filter_symbol`	A character indicating the symbol in the columns corresponding to the `filter` argument that is used to indicate rows that should be removed from the data. Defaults to "+".
`minIdentified`	A numeric value indicating the minimal number of times a peptide sequence should be identified in the dataset in order not to be removed. Defaults to 2.
`remove_only_site`	A logical indicating wheter proteins that are only identified by peptides carrying one or more modification sites should be removed from the data. This requires the extra input of a proteinGroups.txt file in the `file_proteinGroups` argument. Defaults to `FALSE`.
`file_proteinGroups`	The name of the proteinGroups.txt file, which is used to remove proteins that are only identified by peptides carrying one or more modification sites. Only used when `remove_only_site` is set to `TRUE`.
`colClasses`	character. Only used when the `exp_annotation` argument is a filepath. A vector of classes to be assumed for the columns of the experimental annotation data frame. Recycled if necessary. If named and shorter than required, names are matched to the column names with unspecified values are taken to be NA. Possible values are `"keep"` (the default, when the colClasses are unchanged for data frames and `type.convert` is used for files), `NA` (when `type.convert` is always used), `NULL` (when the column is skipped), one of the atomic vector classes (`"logical"`, `"integer"`, `"numeric"`, `"complex"`, `"character"`, `"raw"`), or `"factor"`, `"Date"` or `"POSIXct"`. Otherwise there needs to be an as method (from package `methods`) for conversion from `"character"` to the specified formal class.
`droplevels`	A logical indicating if levels of factors that disappeared during preprocessing should be removed from the data. Defaults to `TRUE`.
`printProgress`	A logical indicating whether the R should print a message before performing each preprocessing step. Defaults to `FALSE`.
`shiny`	A logical indicating whether this function is being used by a Shiny app. Setting this to `TRUE` only works when using this function in a Shiny app and allows for dynamic progress bars. Defaults to `FALSE`.
`message`	Only used when `shiny=TRUE`. A single-element character vector: the message to be displayed to the user, or `NULL` to hide the current message (if any).

Value

A preprocessed MSnSet object that is ready to be converted into a protdata object.

References

Gatto L, Lilley KS. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation. Bioinformatics. 2012 Jan 15;28(2):288-9. https://doi.org/10.1093/bioinformatics/btr645. PubMed PMID:22113085.

Cox, J. and Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol, 2008, 26, pp 1367-72. http://www.nature.com/nbt/journal/v26/n12/full/nbt.1511.html.

statOmics/MSqRob documentation built on Dec. 8, 2022, 6 a.m.