PDtoMSstatsPTMFormat: Convert Proteome Discoverer output into MSstatsPTM format

View source: R/converters.R

PDtoMSstatsPTMFormatR Documentation

Convert Proteome Discoverer output into MSstatsPTM format

Description

Import Proteome Discoverer files, identify modification site location.

Usage

PDtoMSstatsPTMFormat(
  input,
  annotation,
  fasta_path,
  protein_input = NULL,
  annotation_protein = NULL,
  labeling_type = "LF",
  mod_id = "\\(Phospho\\)",
  use_localization_cutoff = FALSE,
  keep_all_mods = FALSE,
  use_unmod_peptides = FALSE,
  fasta_protein_name = "uniprot_iso",
  localization_cutoff = 75,
  remove_unlocalized_peptides = TRUE,
  useNumProteinsColumn = FALSE,
  useUniquePeptide = TRUE,
  summaryforMultipleRows = max,
  removeFewMeasurements = TRUE,
  removeOxidationMpeptides = FALSE,
  removeProtein_with1Peptide = FALSE,
  which_quantification = "Precursor.Area",
  which_proteinid = "Protein.Group.Accessions",
  use_log_file = TRUE,
  append = FALSE,
  verbose = TRUE,
  log_file_path = NULL
)

Arguments

input

PD report corresponding with enriched experimental data.

annotation

name of 'annotation.txt' or 'annotation.csv' data which includes Condition, BioReplicate, Run information. 'Run' will be matched with 'Spectrum.File'

fasta_path

string containing path to the corresponding fasta file for the modified peptide dataset.

protein_input

PD report corresponding with unmodified experimental data.

annotation_protein

Same format as annotation corresponding to unmodified data.

labeling_type

type of experimental design, must be one of LF for label free or TMT for tandem mass tag.

mod_id

Character that indicates the modification of interest. Default is ⁠\\(Phospho\\)⁠. Note ⁠\\⁠ must be included before special characters.

use_localization_cutoff

Boolean indicating whether to use a custom localization cutoff or rely on PD's modifications column. TRUE is default and apply custom cutoff localization_cutoff.

keep_all_mods

Boolean indicating whether to keep or remove peptides not in mod_id. Default is FALSE.

use_unmod_peptides

If protein_input is not provided, unmodified peptides can be extracted from input to be used in place of a global profiling run. Default is FALSE.

fasta_protein_name

Name of fasta column that matches with protein name in evidence file. Default is uniprot_iso.

localization_cutoff

Minimum localization score required to keep modification. Default is .75.

remove_unlocalized_peptides

Boolean indicating if peptides without all sites localized should be kept. Default is TRUE (non-localized sites will be removed).

useNumProteinsColumn

TRUE removes peptides which have more than 1 in Proteins column of PD output.

useUniquePeptide

TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.

summaryforMultipleRows

max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities.

removeFewMeasurements

TRUE (default) will remove the features that have 1 or 2 measurements across runs.

removeOxidationMpeptides

TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default.

removeProtein_with1Peptide

TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default.

which_quantification

Use 'Precursor.Area'(default) column for quantified intensities. 'Intensity' or 'Area' can be used instead.

which_proteinid

Use 'Protein.Accessions'(default) column for protein name. 'Master.Protein.Accessions' can be used instead.

use_log_file

logical. If TRUE, information about data processing will be saved to a file.

append

logical. If TRUE, information about data processing will be added to an existing log file.

verbose

logical. If TRUE, information about data processing will be printed to the console

log_file_path

character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file.

Value

list of data.table

Examples

# Global profiling example
input = system.file("tinytest/raw_data/PD/pd-ptm-input.csv", 
package = "MSstatsPTM")
input = data.table::fread(input)
annot = system.file("tinytest/raw_data/PD/pd-ptm-annot.csv",
                    package = "MSstatsPTM")
annot = data.table::fread(annot)
input_protein = system.file("tinytest/raw_data/PD/protein-input.csv",
                            package = "MSstatsPTM")
input_protein = data.table::fread(input_protein)
annot_protein = system.file("tinytest/raw_data/PD/protein-annot.csv",
                            package = "MSstatsPTM")
annot_protein = data.table::fread(annot_protein)
fasta_path=system.file("extdata", "pd_with_proteome.fasta", 
                       package="MSstatsPTM")
pd_imported = PDtoMSstatsPTMFormat(
    input,
    annotation = annot,
    protein_input = input_protein,
    annotation_protein = annot_protein,
    fasta_path = fasta_path,
    mod_id = "\\(GG\\)",
    labeling_type = "TMT",
    use_localization_cutoff = FALSE,
    which_proteinid = "Master.Protein.Accessions")
   
head(pd_imported$PTM)
head(pd_imported$PROTEIN)

# No global profiling example
head(pd_psm_input)
head(pd_annotation)

msstats_format = PDtoMSstatsPTMFormat(pd_psm_input, 
                                      pd_annotation, 
                                      system.file("extdata", "pd_fasta.fasta", package="MSstatsPTM"),
                                      use_unmod_peptides=TRUE, 
                                      which_proteinid = "Master.Protein.Accessions")

head(msstats_format$PTM)
head(msstats_format$PROTEIN)

Vitek-Lab/MSstatsPTM documentation built on Dec. 19, 2024, 6:02 a.m.