mergePep: Merge peptide table(s) into one

mergePepR Documentation

Merge peptide table(s) into one

Description

mergePep merges individual peptide table(s), TMTset1_LCMSinj1_Peptide_N.txt, TMTset1_LCMSinj2_Peptide_N.txt etc., into one interim Peptide.txt. The log2FC values in the interim result are centered with the medians at zero (median centering). The utility is typically applied after the conversion of PSMs to peptides via PSM2Pep and is required even with a experiment at one multiplex TMT and one LC/MS series.

Usage

mergePep(
  use_duppeps = TRUE,
  mbr_ret_tol = NULL,
  max_mbr_fold = 20L,
  duppeps_repair = c("majority", "denovo"),
  plot_log2FC_cv = TRUE,
  cut_points = Inf,
  rm_allna = FALSE,
  omit_single_lfq = FALSE,
  ret_sd_tol = Inf,
  rm_ret_outliers = FALSE,
  ...
)

Arguments

use_duppeps

Logical; if TRUE, re-assigns double/multiple dipping peptide sequences to the most likely proteins by majority votes.

mbr_ret_tol

The tolerance in MBR retention time in seconds. The default is to match the setting in norPSM.

max_mbr_fold

The maximum absolute fold change in MBR.

duppeps_repair

Not currently used (or only with majority). Character string; the method of reparing double-dipping peptide sequences upon data pooling.

For instance, the same sequence of PEPTIDE may be assigned to protein accession PROT_ACC1 in data set 1 and PROT_ACC2 in data set 2. At the denovo default, the peptide to protein association will be re-established freshly. At the majority alternative, a majority rule will be applied for the re-assignments.

plot_log2FC_cv

Logical; if TRUE, the distributions of the CV of peptide log2FC will be plotted. The default is TRUE.

cut_points

A named, numeric vector defines the cut points (knots) for the median-centering of log2FC by sections. For example, at cut_points = c(mean_lint = seq(4, 7, .5)), log2FC will be binned according to the intervals of -Inf, 4, 4.5, ..., 7, Inf under column mean_lint (mean log10 intensity) in the input data. The default is cut_points = Inf, or equivalently -Inf, where the log2FC under each sample will be median-centered as one piece. See also prnHist for data binning in histogram visualization.

rm_allna

Logical; if TRUE, removes data rows that are exclusively NA across ratio columns of log2_R126 etc. The setting also applies to log2_R000 in LFQ.

omit_single_lfq

Depreciated. Logical; if TRUE, omits LFQ entries with single measured values across all samples. The default is FALSE.

ret_sd_tol

Depreciated. Numeric; the tolerance in the variance of retention time (w.r.t. measures in seconds). The thresholding applies to TMT data. The default is Inf. Depends on the setting of LCMS gradients, a setting of, e.g., 150 might be suitable.

rm_ret_outliers

Depreciated. Logical; if TRUE, removes peptide entries with outlying retention times across samples and/or LCMS series.

...

filter_: Variable argument statements for the row filtration of data against the column keys in individual peptide tables of TMTset1_LCMSinj1_Peptide_N.txt, TMTset1_LCMSinj2_Peptide_N.txt, etc.

The variable argument statements should be in the following format: each statement contains to a list of logical expression(s). The lhs needs to start with filter_. The logical condition(s) at the rhs needs to be enclosed in exprs with round parenthesis. For example, pep_len is a column key present in Mascot peptide tables of TMTset1_LCMSinj1_Peptide_N.txt, TMTset1_LCMSinj2_Peptide_N.txt etc. The statement filter_peps_at = exprs(pep_len <= 50) will remove peptide entries with pep_len > 50. See also normPSM.

Details

In the interim output file, "Peptide.txt", values under columns log2_R... are logarithmic ratios at base 2 in relative to the reference(s) within each multiplex TMT set, or to the row means within each plex if no reference(s) are present. Values under columns N_log2_R... are median-centered log2_R... without scaling normalization. Values under columns Z_log2_R... are N_log2_R... with additional scaling normalization. Values under columns I... are reporter-ion or LFQ intensity before normalization. Values under columns N_I... are normalized I.... Values under columns sd_log2_R... are the standard deviation of the log2FC of proteins from ascribing peptides.

Description of the column keys in the output:
system.file("extdata", "peptide_keys.txt", package = "proteoQ")

The peptide counts in individual peptide tables, TMTset1_LCMSinj1_Peptide_N.txt etc., may be fewer than the entries indicated under the prot_n_pep column after the peptide removals/cleanups using purgePSM.

Value

The primary output is in .../Peptide/Peptide.txt.

See Also

Metadata
load_expts for metadata preparation and a reduced working example in data normalization

Data normalization
normPSM for extended examples in PSM data normalization
PSM2Pep for extended examples in PSM to peptide summarization
mergePep for extended examples in peptide data merging
standPep for extended examples in peptide data normalization
Pep2Prn for extended examples in peptide to protein summarization
standPrn for extended examples in protein data normalization.
purgePSM and purgePep for extended examples in data purging
pepHist and prnHist for extended examples in histogram visualization.
extract_raws and extract_psm_raws for extracting MS file names

Variable arguments of 'filter_...'
contain_str, contain_chars_in, not_contain_str, not_contain_chars_in, start_with_str, end_with_str, start_with_chars_in and ends_with_chars_in for data subsetting by character strings

Missing values
pepImp and prnImp for missing value imputation

Informatics
pepSig and prnSig for significance tests
pepVol and prnVol for volcano plot visualization
prnGSPA for gene set enrichment analysis by protein significance pVals
gspaMap for mapping GSPA to volcano plot visualization
prnGSPAHM for heat map and network visualization of GSPA results
prnGSVA for gene set variance analysis
prnGSEA for data preparation for online GSEA.
pepMDS and prnMDS for MDS visualization
pepPCA and prnPCA for PCA visualization
pepLDA and prnLDA for LDA visualization
pepHM and prnHM for heat map visualization
pepCorr_logFC, prnCorr_logFC, pepCorr_logInt and prnCorr_logInt for correlation plots
anal_prnTrend and plot_prnTrend for trend analysis and visualization
anal_pepNMF, anal_prnNMF, plot_pepNMFCon, plot_prnNMFCon, plot_pepNMFCoef, plot_prnNMFCoef and plot_metaNMF for NMF analysis and visualization

Custom databases
Uni2Entrez for lookups between UniProt accessions and Entrez IDs
Ref2Entrez for lookups among RefSeq accessions, gene names and Entrez IDs
prepGO for gene ontology
prepMSig for molecular signatures
prepString and anal_prnString for STRING-DB

Column keys in PSM, peptide and protein outputs
system.file("extdata", "psm_keys.txt", package = "proteoQ")
system.file("extdata", "peptide_keys.txt", package = "proteoQ")
system.file("extdata", "protein_keys.txt", package = "proteoQ")

Examples


# ===================================
# Merge peptide data
# ===================================

## !!!require the brief working example in `?load_expts`

# everything included
mergePep()

# row filtrations against column keys in `TMTset1_LCMSinj1_Peptide_N.txt`...
mergePep(
  filter_peps_by_sp = exprs(species == "human", pep_len <= 50),
)

# alignment of data by segments
mergePep(cut_points = c(mean_lint = seq(4, 7, .5)))

# alignment of data by empirical protein abundance
# `10^prot_icover - 1` comparable to emPAI
mergePep(cut_points = c(prot_icover = seq(0, 1, .25)))


qzhang503/proteoQ documentation built on Dec. 14, 2024, 12:27 p.m.