MaxQtoMSstatsPTMFormat: Convert output of label-free or TMT MaxQuant experiments into...

View source: R/converters.R

MaxQtoMSstatsPTMFormatR Documentation

Convert output of label-free or TMT MaxQuant experiments into MSstatsPTM format

Description

Takes as input LF/TMT experiments from MaxQ and converts the data into the format needed for MSstatsPTM. Requires modified evidence.txt file from MaxQ and an annotation file for PTM data. To adjust modified peptides for changes in global protein level, unmodified TMT experimental data must also be returned. Optionally can use ⁠Phospho(STY)Sites.txt⁠ (or other PTM specific files) from MaxQuant, but this is not recommended. If PTM specific file provided, the raw intensities must be provided, not a ratio.

Usage

MaxQtoMSstatsPTMFormat(
  evidence = NULL,
  annotation = NULL,
  fasta_path,
  fasta_protein_name = "uniprot_ac",
  mod_id = "\\(Phospho \\(STY\\)\\)",
  sites_data = NULL,
  evidence_prot = NULL,
  proteinGroups = NULL,
  annotation_protein = NULL,
  use_unmod_peptides = FALSE,
  labeling_type = "LF",
  mod_num = "Single",
  TMT_keyword = "TMT",
  ptm_keyword = "phos",
  which_proteinid_ptm = "Proteins",
  which_proteinid_protein = "Proteins",
  remove_other_mods = TRUE,
  removeMpeptides = FALSE,
  removeOxidationMpeptides = FALSE,
  removeProtein_with1Peptide = FALSE,
  use_log_file = TRUE,
  append = FALSE,
  verbose = TRUE,
  log_file_path = NULL
)

Arguments

evidence

name of 'evidence.txt' data, which includes feature-level data for enriched (PTM) data.

annotation

data frame annotation file for the ptm level data. Contains column Run, Fraction, TechRepMixture, Mixture, Channel, BioReplicate, Condition.

fasta_path

A string of path to a FASTA file, used to match PTM peptides.

fasta_protein_name

Name of fasta column that matches with protein name in evidence file. Default is uniprot_ac.

mod_id

Character that indicates the modification of interest. Default is ⁠\\(Phospho\\)⁠. Note ⁠\\⁠ must be included before special characters.

sites_data

(Not recommended. Only used if evidence file not provided. Only works for TMT labeled data) Modified peptide output from MaxQuant. For example, a phosphorylation experiment would require the Phospho(STY)Sites.txt file

evidence_prot

name of 'evidence.txt' data, which includes feature-level data for global profiling (unmodified) data.

proteinGroups

name of 'proteinGroups.txt' data. It needs to matching protein group ID in evidence_prot.

annotation_protein

data frame annotation file for the protein level data. Contains column Run, Fraction, TechRepMixture, Mixture, Channel, BioReplicate, Condition.

use_unmod_peptides

Boolean if the unmodified peptides in the input file should be used to construct the unmodified protein output. Only used if input_protein is not provided. Default is FALSE.

labeling_type

Either TMT or LF (Label-Free) depending on experimental design. Default is LF.

mod_num

(Only if sites.data is used) For modified peptide dataset. The number modifications per peptide to be used. If "Single", only peptides with one modification will be used. Otherwise "Total" can be selected which does not cap the number of modifications per peptide. "Single" is the default. Selecting "Total" may confound the effect of different modifications.

TMT_keyword

(Only if sites.data is used) the sub-name of columns in sites.data file. Default is TMT. This corresponds to the columns in the format Reporter.intensity.corrected.1.TMT1phos___1. Specifically, this parameter indicates the first section of the string TMT1phos (Before the mixture number). If TMT is present in the string, set this value to TMT. Else if TMT is not there (ie string is in the format ⁠1phos⁠) leave this parameter as an empty string (”).

ptm_keyword

(Only if sites.data is used) the sub-name of columns in the sites.data file. Default is phos. This corresponds to the columns in the format Reporter.intensity.corrected.1.TMT1phos___1. Specifically, this parameter indicates the second section of the string TMT1phos (After the mixture number). If the string is present, set this parameter. Else if this part of the string is empty (ie string is in the format TMT1) leave this parameter as an empty string (”).

which_proteinid_ptm

For PTM dataset, which column to use for protein name. Use 'Proteins'(default) column for protein name. 'Leading.proteins' or 'Leading.razor.protein' or 'Gene.names' can be used instead to get the protein ID with single protein. However, those can potentially have the shared peptides.

which_proteinid_protein

For Protein dataset, which column to use for protein name. Same options as above.

remove_other_mods

Remove peptides which include modfications other than the one listed in mod_id. Default is TRUE. For example, in an experiment targeting Phosphorylation, setting this parameter to TRUE would remove peptides like (Acetyl (Protein N-term))AAAAPDSRVS(Phospho (STY))EEENLK. Set this parameter to FALSE to keep peptides with extraneous modifications.

removeMpeptides

If Oxidation (M) modifications should be removed. Default is TRUE.

removeOxidationMpeptides

TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default.

removeProtein_with1Peptide

TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default.

use_log_file

logical. If TRUE, information about data processing will be saved to a file.

append

logical. If TRUE, information about data processing will be added to an existing log file.

verbose

logical. If TRUE, information about data processing wil be printed to the console.

log_file_path

character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file.

Value

a list of two data.tables named 'PTM' and 'PROTEIN' in the format required by MSstatsPTM.

Examples

# TMT experiment
head(maxq_tmt_evidence)
head(maxq_tmt_annotation)

msstats_format_tmt = MaxQtoMSstatsPTMFormat(evidence=maxq_tmt_evidence,
                        annotation=maxq_tmt_annotation,
                        fasta=system.file("extdata", "maxq_tmt_fasta.fasta", package="MSstatsPTM"),
                        fasta_protein_name="uniprot_ac",
                        mod_id="\\(Phospho \\(STY\\)\\)",
                        use_unmod_peptides=TRUE,
                        labeling_type = "TMT",
                        which_proteinid_ptm = "Proteins")

head(msstats_format_tmt$PTM)
head(msstats_format_tmt$PROTEIN)

# LF experiment
head(maxq_lf_evidence)
head(maxq_lf_annotation)

msstats_format_lf = MaxQtoMSstatsPTMFormat(evidence=maxq_lf_evidence,
                        annotation=maxq_lf_annotation,
                        fasta=system.file("extdata", "maxq_lf_fasta.fasta", package="MSstatsPTM"),
                        fasta_protein_name="uniprot_ac",
                        mod_id="\\(Phospho \\(STY\\)\\)",
                        use_unmod_peptides=TRUE,
                        labeling_type = "LF",
                        which_proteinid_ptm = "Proteins")
head(msstats_format_lf$PTM)
head(msstats_format_lf$PROTEIN)

Vitek-Lab/MSstatsPTM documentation built on Sept. 26, 2024, 9:28 p.m.