splitPSM_mz: Splits PSM tables from matchMS.

splitPSM_mzR Documentation

Splits PSM tables from matchMS.

Description

Splits PSM tables from matchMS.

Usage

splitPSM_mz(
  group_psm_by = "pep_seq",
  group_pep_by = "prot_acc",
  fasta = NULL,
  entrez = NULL,
  pep_unique_by = "group",
  scale_rptr_int = FALSE,
  rm_craps = FALSE,
  rm_krts = FALSE,
  rm_allna = FALSE,
  purge_phosphodata = TRUE,
  annot_kinases = FALSE,
  plot_rptr_int = TRUE,
  rptr_intco = 0,
  rptr_intrange = c(0, 100),
  use_lowercase_aa = TRUE,
  use_spec_counts = FALSE,
  lfq_mbr = TRUE,
  mbr_ret_tol = 25L,
  parallel = TRUE,
  ...
)

Arguments

group_psm_by

A character string specifying the method in PSM grouping. At the pep_seq default, descriptive statistics will be calculated based on the same pep_seq groups. At the pep_seq_mod alternative, peptides with different variable modifications will be treated as different species and descriptive statistics will be calculated based on the same pep_seq_mod groups.

group_pep_by

A character string specifying the method in peptide grouping. At the prot_acc default, descriptive statistics will be calculated based on the same prot_acc groups. At the gene alternative, proteins with the same gene name but different accession numbers will be treated as one group.

fasta

Character string(s) to the name(s) of fasta file(s) with prepended directory path. The fasta database(s) need to match those used in MS/MS ion search. There is no default and users need to provide the correct file path(s) and name(s).

entrez

Character string(s) to the name(s) of entrez file(s) with prepended directory path. At the NULL default, a convenience lookup is available for species among c("human", "mouse", "rat"). For other species, users need to provide the file path(s) and name(s) for the lookup table(s). See also Uni2Entrez and Ref2Entrez for preparing custom entrez files.

pep_unique_by

A character string for annotating the uniqueness of peptides. At the group default, the uniqueness of peptides is by groups with the collapses of same-set or sub-set proteins. At a more stringent criterion of protein, the uniqueness of peptides is by protein entries without grouping. On the other extreme of choice none, all peptides are treated as unique. A new column of pep_isunique with corresponding logical TRUE or FALSE will be added to the PSM reports. Note that the choice of none is only for convenience, as the same may be achieved by setting use_unique_pep = FALSE in Pep2Prn.

scale_rptr_int

Logical; if TRUE, scales (up) MS2 reporter-ion intensities by MS1 precursor intensity: I_{MS1}*(I_{x}/\sum I_{MS2}). I_{MS1}, MS1 precursor intensity; I_{MS2}, MS2 reporter-ion intensity; I_{x}, MS2 reporter-ion intensity under TMT channel x. Note that the scaling will not affect log2FC.

rm_craps

Logical; if TRUE, cRAP proteins will be removed. The default is FALSE.

rm_krts

Logical; if TRUE, keratin entries will be removed. The default is FALSE.

rm_allna

Logical; if TRUE, removes data rows that are exclusively NA across ratio columns of log2_R126 etc. The setting also applies to log2_R000 in LFQ.

purge_phosphodata

Logical; if TRUE and phosphorylation present as variable modification(s), entries without phosphorylation will be removed. The default is TRUE.

annot_kinases

Logical; if TRUE, proteins of human or mouse origins will be annotated with their kinase attributes. The default is FALSE.

plot_rptr_int

Logical; if TRUE, the distributions of reporter-ion intensities will be plotted. The default is TRUE. The argument is also applicable to the precursor intensity with MaxQuant LFQ.

rptr_intco

Numeric; the threshold of reporter-ion intensity (TMT: I126 etc.; LFQ: I000) being considered non-trivial. The default is 0 without cut-offs. The data nullification will not be applied synchronously to the precursor intensity (pep_tot_int) under the same PSM query. To guard against odds such as higher MS2 reporter-ion intensities than their contributing MS1 precursor intensity, employs for example filter_... = rlang::exprs(pep_tot_int >= my_ms1_cutoff) during PSM2Pep. The rule of thumb is that pep_tot_int is a single column; thus the corresponding data filtration against it may be readily achieved without introducing new arguments. By contrast, rptr_intco applies to a set of columns, I126 etc.; it might be slightly more involved/laborious when applying suitable statements of filter_ varargs.

rptr_intrange

Numeric vector at length two. The argument specifies the range of reporter-ion intensities (TMT: I126 etc.; LFQ: I000) being considered non-trivial. The default is between 0 and 100 percentile without cut-offs. While argument rptr_intco employs a universal cut-off across samples by absolute values, range_int provides an alternative means of sample-specific thresholding of intensities by percentiles. The data nullification will not be applied synchronously to the precursor intensity under the same PSM query.

use_lowercase_aa

Logical; if TRUE, modifications in amino acid residues will be abbreviated with lower-case and/or ^_~. See the table below for details. The default is TRUE.

use_spec_counts

Logical; If TRUE, uses spectrum counts for quantitation with Mascot or Mzion outputs.

lfq_mbr

Logical; if TRUE, performs match-between-run (MBR) with Mzion LFQ data. Also requires ms1full_[rawfile].rds at the same file-folder level of psmQ[...].txt.

mbr_ret_tol

Retention time tolerance (in seconds) for LFQ-MBR.

...

Not currently used.


qzhang503/proteoQ documentation built on Dec. 14, 2024, 12:27 p.m.