| normPSM | R Documentation |
normPSM standardizes
PSM results
from database search engines.
normPSM(
dat_dir = NULL,
expt_smry = "expt_smry.xlsx",
frac_smry = "frac_smry.xlsx",
fasta = NULL,
entrez = NULL,
group_psm_by = c("pep_seq_mod", "pep_seq"),
group_pep_by = c("gene", "prot_acc"),
pep_unique_by = c("group", "protein", "none"),
mc_psm_by = c("peptide", "protein", "psm"),
scale_rptr_int = FALSE,
rptr_intco = 0,
rptr_intrange = c(0, 100),
rm_craps = FALSE,
rm_krts = FALSE,
rm_outliers = FALSE,
rm_allna = FALSE,
type_sd = c("log2_R", "N_log2_R", "Z_log2_R"),
lfq_mbr = TRUE,
mbr_ret_tol = 30,
purge_phosphodata = TRUE,
annot_kinases = FALSE,
plot_rptr_int = TRUE,
plot_log2FC_cv = TRUE,
use_lowercase_aa = FALSE,
use_spec_counts = FALSE,
use_corrected_mqint = TRUE,
rm_reverses = TRUE,
...
)
dat_dir |
A character string to the working directory. The default is to match the value under the global environment. |
expt_smry |
A character string to a |
frac_smry |
A character string to a |
fasta |
Character string(s) to the name(s) of fasta file(s) with
prepended directory path. The |
entrez |
Character string(s) to the name(s) of entrez file(s) with
prepended directory path. At the |
group_psm_by |
A character string specifying the method in PSM grouping.
At the |
group_pep_by |
A character string specifying the method in peptide
grouping. At the |
pep_unique_by |
A character string for annotating the uniqueness of
peptides. At the |
mc_psm_by |
A character string specifying the method in the median
centering of PSM |
scale_rptr_int |
Logical; if TRUE, scales (up) MS2 reporter-ion
intensities by MS1 precursor intensity: |
rptr_intco |
Numeric; the threshold of reporter-ion intensity (TMT:
|
rptr_intrange |
Numeric vector at length two. The argument specifies the
range of reporter-ion intensities (TMT: |
rm_craps |
Logical; if TRUE, cRAP proteins will be removed. The default is FALSE. |
rm_krts |
Logical; if TRUE, keratin entries will be removed. The default is FALSE. |
rm_outliers |
Logical; if TRUE, PSM outlier removals will be performed
for peptides with more than two identifying PSMs. Dixon's method will be
used when |
rm_allna |
Logical; if TRUE, removes data rows that are exclusively NA
across ratio columns of |
type_sd |
Character string; the type of log2Ratios for SD calculations.
The value is one |
lfq_mbr |
Logical; if TRUE, performs match-between-run (MBR) with Mzion
LFQ data. Also requires |
mbr_ret_tol |
Retention time tolerance (in seconds) for LFQ-MBR. |
purge_phosphodata |
Logical; if TRUE and phosphorylation present as variable modification(s), entries without phosphorylation will be removed. The default is TRUE. |
annot_kinases |
Logical; if TRUE, proteins of human or mouse origins will be annotated with their kinase attributes. The default is FALSE. |
plot_rptr_int |
Logical; if TRUE, the distributions of reporter-ion intensities will be plotted. The default is TRUE. The argument is also applicable to the precursor intensity with MaxQuant LFQ. |
plot_log2FC_cv |
Logical; if TRUE, the distributions of the CV of
peptide |
use_lowercase_aa |
Logical; if TRUE, modifications in amino acid residues
will be abbreviated with lower-case and/or |
use_spec_counts |
Logical; If TRUE, uses spectrum counts for quantitation with Mascot or Mzion outputs. |
use_corrected_mqint |
A logical argument for uses with |
rm_reverses |
A logical argument for uses with |
... |
|
In each primary output file, "...PSM_N.txt", values under columns
log2_R... are logarithmic ratios at base 2 in relative to the average
intensity of reference(s) within each multiplex TMT set, or to the
row-mean intensity within each plex if no reference(s) are present.
Values under columns N_log2_R... are log2_R... with
median-centering alignment. Values under columns I... are raw
reporter-ion intensity from database searches. Values under columns
N_I... are normalized reporter-ion intensity. Values under
columns sd_log2_R... are the standard deviation of the log2FC
of peptides from ascribing PSMs. Character strings under pep_seq_mod
denote peptide sequences with applicable variable modifications.
Nomenclature of pep_seq_mod:
| Variable modification | Abbreviation |
| Non-terminal | A letter from upper to lower case, e.g., mtFPEADILLK
|
| N-term | A hat to the left of a peptide sequence, e.g.,
^QDGTHVVEAVDATHIGK |
| C-term | A hat to the right of a peptide
sequence, e.g., DAYYNLCLPQRPnMI^ |
| Acetyl (Protein N-term) | A
underscore to the left of a peptide sequence, e.g., _mAsGVAVSDGVIK.
|
| Amidated (Protein C-term) | A underscore to the right of a peptide
sequence, e.g., DAYYNLCLPQRPnMI_. |
| Other (Protein N-term) | A
tilde to the left of a peptide sequence, e.g., ~mAsGVAVSDGVIK |
| Other (Protein C-term) | An tilde to the right of a peptide sequence, e.g.
DAYYNLCLPQRPnMI~ |
Outputs are interim and final PSM tables under the directory of
PSM sub to dat_dir. Primary results are in
standardized PSM tables of TMTset1_LCMSinj1_PSM_N.txt,
TMTset2_LCMSinj1_PSM_N.txt, etc. The indexes of TMT experiment and LC/MS
injection are indicated in the file names.
MascotUsers will export PSM data from
Mascot at a .csv
format and store them under the file folder indicated by dat_dir.
The header information should be included during the .csv export.
The file name(s) should start with the letter 'F' and ended with a
'.csv' extension (e.g., F004452.csv, F004453_this.csv etc.).
MaxQuantUsers will copy over msms.txt file(s) from
MaxQuant to the dat_dir directory.
The file name(s) should start with 'msms' and end with a
'.txt' extension (e.g., msms.txt, msms_this.txt etc.).
MSFraggerUsers will copy over psm.tsv file(s) from
MSFragger to the dat_dir
directory. The file name(s) should start with 'psm' and end with a
'.tsv' extension (e.g., psm.tsv, psm_this.tsv etc.).
Spectrum MillUsers will copy over PSMexport.1.ssv
file(s) from
Spectrum
Mill to the dat_dir directory. The file name(s) should start with
'PSMexport' and end with a '.ssv' extension (e.g.,
PSMexport.ssv, PSMexport_this.ssv etc.).
Variable arguments and data filesVariable argument (vararg)
statements of filter_ and arrange_ are available in
proteoQ for flexible filtration and ordering of data rows, via
functions at users' interface. To take advantage of the feature, users need
to be aware of the column keys in input files. As indicated by their names,
filter_ and filter2_ perform row filtration against column
keys from a primary data file, df, and secondary data file(s),
df2, respectively. The same correspondence is applicable for
arrange_ and arrange2_ varargs.
Users will typically
employ either primary or secondary vararg statements, but not both. In the
more extreme case of gspaMap(...), it links prnGSPA
findings in df2 to the significance pVals and abundance fold
changes in df for volcano plot visualizations by gene sets. The
table below summarizes the df and the df2 for varargs in
proteoQ.
| Utility | Vararg_ | df | Vararg2_ | df2 |
| normPSM | filter_ | Mascot, F[...].csv; MaxQuant, msms[...].txt;
SM, PSMexport[...].ssv | NA | NA |
| PSM2Pep | NA | NA | NA | NA |
| mergePep | filter_ | TMTset1_LCMSinj1_Peptide_N.txt | NA | NA |
| standPep | slice_ | Peptide.txt | NA | NA |
| Pep2Prn | filter_ | Peptide.txt | NA | NA |
| standPrn | slice_ | Protein.txt | NA | NA |
| pepHist | filter_ | Peptide.txt | NA | NA |
| prnHist | filter_ | Protein.txt | NA | NA |
| pepSig | filter_ | Peptide[_impNA].txt | NA | NA |
| prnSig | filter_ | Protein[_impNA].txt | NA | NA |
| pepMDS | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
| prnMDS | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
| pepPCA | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
| prnPCA | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
| pepLDA | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
| prnLDA | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
| pepEucDist | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
| prnEucDist | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
| pepCorr_logFC | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
| prnCorr_logFC | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
| pepHM | filter_, arrange_ | Peptide[_impNA][_pVal].txt | NA | NA |
| prnHM | filter_, arrange_ | Protein[_impNA][_pVal].txt | NA | NA |
| anal_prnTrend | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
| plot_prnTrend | NA | NA | filter2_ | [...]Protein_Trend_{NZ}[_impNA][...].txt |
| anal_pepNMF | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
| anal_prnNMF | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
| plot_pepNMFCon | NA | NA | filter2_ | [...]Peptide_NMF[...]_consensus.txt |
| plot_prnNMFCon | NA | NA | filter2_ | [...]Protein_NMF[...]_consensus.txt |
| plot_pepNMFCoef | NA | NA | filter2_ | [...]Peptide_NMF[...]_coef.txt |
| plot_prnNMFCoef | NA | NA | filter2_ | [...]Protein_NMF[...]_coef.txt |
| plot_metaNMF | filter_, arrange_ | Protein[_impNA][_pVal].txt | NA | NA |
| prnGSPA | filter_ | Protein[_impNA]_pVals.txt | NA | NA |
| prnGSPAHM | NA | NA | filter2_ | [...]Protein_GSPA_{NZ}[_impNA]_essmap.txt |
| gspaMap | filter_ | Protein[_impNA]_pVal.txt | filter2_ | [...]Protein_GSPA_{NZ}[_impNA].txt |
| anal_prnString | filter_ | Protein[_impNA][_pVals].txt | NA | NA |
Metadata
load_expts for metadata
preparation and a reduced working example in data normalization
Data normalization
normPSM for extended examples
in PSM data normalization
PSM2Pep for extended examples
in PSM to peptide summarization
mergePep for extended
examples in peptide data merging
standPep for extended
examples in peptide data normalization
Pep2Prn for
extended examples in peptide to protein summarization
standPrn for extended examples in protein data normalization.
purgePSM and purgePep for extended examples
in data purging
pepHist and prnHist for
extended examples in histogram visualization.
extract_raws and extract_psm_raws for
extracting MS file names
Variable arguments of filter_...
contain_str, contain_chars_in,
not_contain_str, not_contain_chars_in,
start_with_str, end_with_str,
start_with_chars_in and ends_with_chars_in for
data subsetting by character strings
Missing values
pepImp and prnImp for
missing value imputation
Informatics
pepSig and prnSig for
significance tests
pepVol and prnVol for
volcano plot visualization
prnGSPA for gene set
enrichment analysis by protein significance pVals
gspaMap
for mapping GSPA to volcano plot visualization
prnGSPAHM
for heat map and network visualization of GSPA results
prnGSVA for gene set variance analysis
prnGSEA for data preparation for online GSEA.
pepMDS and prnMDS for MDS visualization
pepPCA and prnPCA for PCA visualization
pepLDA and prnLDA for LDA visualization
pepHM and prnHM for heat map visualization
pepCorr_logFC, prnCorr_logFC,
pepCorr_logInt and prnCorr_logInt for
correlation plots
anal_prnTrend and
plot_prnTrend for trend analysis and visualization
anal_pepNMF, anal_prnNMF,
plot_pepNMFCon, plot_prnNMFCon,
plot_pepNMFCoef, plot_prnNMFCoef and
plot_metaNMF for NMF analysis and visualization
Custom databases
Uni2Entrez for lookups between
UniProt accessions and Entrez IDs
Ref2Entrez for lookups
among RefSeq accessions, gene names and Entrez IDs
prepGO for gene
ontology
prepMSig for molecular
signatures
prepString and anal_prnString for STRING-DB
Column keys in PSM, peptide and protein outputs
system.file("extdata", "psm_keys.txt", package = "proteoQ")
system.file("extdata", "peptide_keys.txt", package = "proteoQ")
system.file("extdata", "protein_keys.txt", package = "proteoQ")
# ===================================
# PSM normalization
# ===================================
## !!!require the brief working example in `?load_expts`
## additional examples
# Mascot
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = prot_acc,
fasta = c("~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
"~/proteoQ/dbs/fasta/refseq/refseq_mm_2013_07.fasta"),
# variable argument statement(s)
filter_psms_at = exprs(pep_expect <= .1),
filter_psms_more = exprs(pep_rank == 1, pep_exp_z > 1),
)
# MaxQuant
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = prot_acc,
fasta = c("~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
"~/proteoQ/dbs/fasta/refseq/refseq_mm_2013_07.fasta"),
corrected_int = TRUE,
rm_reverses = TRUE,
# vararg statement(s)
filter_psms_at = exprs(PEP <= 0.1),
)
# MSFragger
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = prot_acc,
fasta = c("~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
"~/proteoQ/dbs/fasta/refseq/refseq_mm_2013_07.fasta"),
# vararg statement(s)
filter_psms_at = exprs(Hyperscore >= 10),
)
# Spectrum Mill
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = prot_acc,
fasta = c("~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
"~/proteoQ/dbs/fasta/refseq/refseq_mm_2013_07.fasta"),
# vararg statement(s)
filter_psms_at = exprs(score >= 10),
)
###############################################
## Custom entrez lookups
# (1) can overwrite the `proteoQ` default for
# species in "human", "mouse" and "rat"
# (2) and are required for `other` species
###############################################
# see also `?Uni2Entrez` or `?Ref2Entrez` for more examples
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("org.Hs.eg.db")
BiocManager::install("org.Mm.eg.db")
library(org.Hs.eg.db)
library(org.Mm.eg.db)
library(proteoQ)
Ref2Entrez(species = human)
Ref2Entrez(species = mouse)
# see also Uni2Entrez(...) for Uniprot to Entrez lookups
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = gene,
fasta = c("~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
"~/proteoQ/dbs/fasta/refseq/refseq_mm_2013_07.fasta"),
entrez = c("~/proteoQ/dbs/entrez/refseq_entrez_hs.rds",
"~/proteoQ/dbs/entrez/refseq_entrez_mm.rds"),
)
## Not run:
# wrong fasta
normPSM(
fasta = "~/proteoQ/dbs/fasta/wrong.fasta",
)
# no mouse entry annotation
normPSM(
fasta = "~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
)
# bad vararg statement
normPSM(
fasta = c("~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
"~/proteoQ/dbs/fasta/refseq/refseq_mm_2013_07.fasta"),
filter_psms_at = exprs(column_key_not_in_psm_tables <= .1),
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.