mergePep | R Documentation |
mergePep
merges individual peptide table(s),
TMTset1_LCMSinj1_Peptide_N.txt, TMTset1_LCMSinj2_Peptide_N.txt
etc.,
into one interim Peptide.txt
. The log2FC
values in the interim
result are centered with the medians at zero (median centering). The utility
is typically applied after the conversion of PSMs to peptides via
PSM2Pep
and is required even with a experiment at one multiplex
TMT and one LC/MS series.
mergePep(
use_duppeps = TRUE,
mbr_ret_tol = NULL,
max_mbr_fold = 20L,
duppeps_repair = c("majority", "denovo"),
plot_log2FC_cv = TRUE,
cut_points = Inf,
rm_allna = FALSE,
omit_single_lfq = FALSE,
ret_sd_tol = Inf,
rm_ret_outliers = FALSE,
...
)
use_duppeps |
Logical; if TRUE, re-assigns double/multiple dipping peptide sequences to the most likely proteins by majority votes. |
mbr_ret_tol |
The tolerance in MBR retention time in seconds. The default is to match the setting in norPSM. |
max_mbr_fold |
The maximum absolute fold change in MBR. |
duppeps_repair |
Not currently used (or only with For instance, the same sequence of PEPTIDE may be assigned to protein
accession PROT_ACC1 in data set 1 and PROT_ACC2 in data set 2. At the
|
plot_log2FC_cv |
Logical; if TRUE, the distributions of the CV of
peptide |
cut_points |
A named, numeric vector defines the cut points (knots) for
the median-centering of |
rm_allna |
Logical; if TRUE, removes data rows that are exclusively NA
across ratio columns of |
omit_single_lfq |
Depreciated. Logical; if TRUE, omits LFQ entries with single measured values across all samples. The default is FALSE. |
ret_sd_tol |
Depreciated. Numeric; the tolerance in the variance of
retention time (w.r.t. measures in seconds). The thresholding applies to
TMT data. The default is |
rm_ret_outliers |
Depreciated. Logical; if TRUE, removes peptide entries with outlying retention times across samples and/or LCMS series. |
... |
|
In the interim output file, "Peptide.txt
", values under columns
log2_R...
are logarithmic ratios at base 2 in relative to the
reference(s)
within each multiplex TMT set, or to the row means within
each plex if no reference(s)
are present. Values under columns
N_log2_R...
are median-centered log2_R...
without scaling
normalization. Values under columns Z_log2_R...
are N_log2_R...
with additional scaling normalization. Values under columns I...
are
reporter-ion or LFQ intensity before normalization. Values under columns
N_I...
are normalized I...
. Values under columns
sd_log2_R...
are the standard deviation of the log2FC
of
proteins from ascribing peptides.
Description of the column keys in the output:
system.file("extdata",
"peptide_keys.txt", package = "proteoQ")
The peptide counts in individual peptide tables,
TMTset1_LCMSinj1_Peptide_N.txt
etc., may be fewer than the entries
indicated under the prot_n_pep
column after the peptide
removals/cleanups using purgePSM
.
The primary output is in .../Peptide/Peptide.txt
.
Metadata
load_expts
for metadata
preparation and a reduced working example in data normalization
Data normalization
normPSM
for extended examples
in PSM data normalization
PSM2Pep
for extended examples
in PSM to peptide summarization
mergePep
for extended
examples in peptide data merging
standPep
for extended
examples in peptide data normalization
Pep2Prn
for
extended examples in peptide to protein summarization
standPrn
for extended examples in protein data normalization.
purgePSM
and purgePep
for extended examples
in data purging
pepHist
and prnHist
for
extended examples in histogram visualization.
extract_raws
and extract_psm_raws
for
extracting MS file names
Variable arguments of 'filter_...'
contain_str
,
contain_chars_in
, not_contain_str
,
not_contain_chars_in
, start_with_str
,
end_with_str
, start_with_chars_in
and
ends_with_chars_in
for data subsetting by character strings
Missing values
pepImp
and prnImp
for
missing value imputation
Informatics
pepSig
and prnSig
for
significance tests
pepVol
and prnVol
for
volcano plot visualization
prnGSPA
for gene set
enrichment analysis by protein significance pVals
gspaMap
for mapping GSPA to volcano plot visualization
prnGSPAHM
for heat map and network visualization of GSPA results
prnGSVA
for gene set variance analysis
prnGSEA
for data preparation for online GSEA.
pepMDS
and prnMDS
for MDS visualization
pepPCA
and prnPCA
for PCA visualization
pepLDA
and prnLDA
for LDA visualization
pepHM
and prnHM
for heat map visualization
pepCorr_logFC
, prnCorr_logFC
,
pepCorr_logInt
and prnCorr_logInt
for
correlation plots
anal_prnTrend
and
plot_prnTrend
for trend analysis and visualization
anal_pepNMF
, anal_prnNMF
,
plot_pepNMFCon
, plot_prnNMFCon
,
plot_pepNMFCoef
, plot_prnNMFCoef
and
plot_metaNMF
for NMF analysis and visualization
Custom databases
Uni2Entrez
for lookups between
UniProt accessions and Entrez IDs
Ref2Entrez
for lookups
among RefSeq accessions, gene names and Entrez IDs
prepGO
for
gene
ontology
prepMSig
for
molecular
signatures
prepString
and anal_prnString
for STRING-DB
Column keys in PSM, peptide and protein outputs
system.file("extdata", "psm_keys.txt", package = "proteoQ")
system.file("extdata", "peptide_keys.txt", package = "proteoQ")
system.file("extdata", "protein_keys.txt", package = "proteoQ")
# ===================================
# Merge peptide data
# ===================================
## !!!require the brief working example in `?load_expts`
# everything included
mergePep()
# row filtrations against column keys in `TMTset1_LCMSinj1_Peptide_N.txt`...
mergePep(
filter_peps_by_sp = exprs(species == "human", pep_len <= 50),
)
# alignment of data by segments
mergePep(cut_points = c(mean_lint = seq(4, 7, .5)))
# alignment of data by empirical protein abundance
# `10^prot_icover - 1` comparable to emPAI
mergePep(cut_points = c(prot_icover = seq(0, 1, .25)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.