View source: R/preprocess_MaxQuant.R
preprocess_generic | R Documentation |
This function allows to perform a standard preprocessing pipeline on MSnSet
objects (Gatto et al., 2012).
By default, intensity values are log2 transformed and then quantile normalized. Next, the smallestUniqueGroups
function is applied,
which removes proteins groups for which any of its member proteins is present in a smaller protein group. Then, peptides that need to be filtered out are removed.
Next, irrelevant columns are dropped. Then, peptide sequences that are identified only once in a single mass spec run are removed because with only 1 identification, the model will be perfectly confounded. Finally, potential experimental annotations are added to the data frame.
Note that with this function, certain default preprocessing steps are undertaken, depending on the MSnSetType
input. If you want more liberty in the preprocessing, please make use of the preprocess_MSnSet
function!
preprocess_generic(MSnSet, MSnSetType, exp_annotation = NULL, type_annot = NULL, aggr_by = NULL, aggr_function = "sum", logtransform = TRUE, base = 2, normalisation = "quantiles", weights = NULL, smallestUniqueGroups = TRUE, split = NULL, useful_properties = NULL, filter = NULL, filter_symbol = NULL, minIdentified = 2, external_filter_file = NULL, external_filter_accession = NULL, external_filter_column = NULL, colClasses = "keep", droplevels = TRUE, printProgress = FALSE, shiny = FALSE, message = NULL)
MSnSet |
An |
MSnSetType |
One of the following: |
exp_annotation |
Either the path to the file which contains the experiment annotation or a data frame containing the experiment annotation. Exactly one colum in the experiment annotation should contain the mass spec run names. Annotation in a file can be both a tab-delimited text document or an Excel file. For more details, see |
type_annot |
If |
aggr_by |
A character indicating the column by which the data should be aggregated. We advise to aggregate the data by peptide sequence (thus aggregate over different charge states and modification statuses of the same peptide). If you only want to aggregate over charge states, set |
aggr_function |
The function used to aggregate intensity data. Defaults to |
logtransform |
A logical value indicating whether the intensities should be log-transformed. Defaults to |
base |
A positive or complex number: the base with respect to which logarithms are computed. Defaults to 2. |
normalisation |
A character vector of length one that describes how to normalise the |
weights |
Only used when |
smallestUniqueGroups |
A logical indicating whether protein groups for which any of its member proteins is present in a smaller protein group should be removed from the dataset. Defaults to |
useful_properties |
The columns of the |
filter |
A vector of names corresponding to the columns in the |
filter_symbol |
Only used when |
minIdentified |
A numeric value indicating the minimal number of times a peptide sequence should be identified in the dataset in order not to be removed. Defaults to 2. |
external_filter_file |
The name of an external protein filtering file. Sometimes, users want to filter out proteins based on a separate protein file. This file should contain at least a column with name equal to the value in |
external_filter_accession |
Only used when |
external_filter_column |
Only used when |
colClasses |
character. Only used when the |
droplevels |
A logical indicating if levels of factors that disappeared during preprocessing should be removed from the data. Defaults to |
printProgress |
A logical indicating whether the R should print a message before performing each preprocessing step. Defaults to |
shiny |
A logical indicating whether this function is being used by a Shiny app. Setting this to |
message |
Only used when |
details |
Only used when |
A preprocessed MSnSet
object that is ready to be converted into a protdata
object.
Gatto L, Lilley KS. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation. Bioinformatics. 2012 Jan 15;28(2):288-9. https://doi.org/10.1093/bioinformatics/btr645. PubMed PMID:22113085.
Argentini A, Goeminne LJE, Verheggen K, Hulstaert N, Staes A, Clement L & Martens L. moFF: a robust and automated approach to extract peptide ion intensities. Nature Methods. 2016 13:964–966. http://www.nature.com/nmeth/journal/v13/n12/full/nmeth.4075.html.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.