| load_expts | R Documentation |
load_expts processes .xlsx or .csv files containing the
metadata of TMT or LFQ experiments. For simplicity, .xlsx will be
assumed in the document.
load_expts(
dat_dir = NULL,
expt_smry = "expt_smry.xlsx",
frac_smry = "frac_smry.xlsx"
)
dat_dir |
A character string to the working directory. The default is to match the value under the global environment. |
expt_smry |
A character string to a |
frac_smry |
A character string to a |
expt_smry.xlsxThe expt_smry.xlsx files should be
placed immediately under the file folder defined by dat_dir. The tab
containing the metadata of TMT or LFQ experiments should be named
Setup. The Excel spread sheet therein is comprised of three
tiers of fields: (1) essential, (2) optional default and (3) optional open.
The essential columns contain the mandatory information of the
experiments. The optional default columns serve as the fields for
default lookups in sample selection, grouping, ordering, aesthetics, etc.
The optional open fields allow users to define their own analysis,
aesthetics, etc.
| Essential column | Descrption |
| Sample_ID | Unique sample IDs |
| TMT_Channel | TMT channel names:
126, 127N, 127C etc. (left void for LFQ) |
| TMT_Set | TMT experiment indexes 1, 2, 3, ... (auto-filled for LFQ) |
| LCMS_Injection | LC/MS injection indexes 1, 2, 3, ... under a
TMT_Set |
| RAW_File | MS data file names originated by MS
software(s) |
| Reference | Labels indicating reference samples in TMT or LFQ experiments |
Sample_ID: values should be unique for entries at a unique
combination of TMT_Channel and TMT_Set, or voided for unused
entries. Samples with the same indexes of TMT_Channel and
TMT_Set but different indexes of LCMS_Injection should have
the same value in Sample_ID. No white space or special characters are
allowed. See also posts for
sample
exclusion.
RAW_File: (a) for analysis with off-line fractionation of peptides
before LC/MS, values under the RAW_File column should be left void.
Instead, the correspondence between the fraction numbers and RAW_File
names should be specified in a separate file, for example,
frac_smry.xlsx. (2) For analysis without off-line fractionation, it
is recommended as well to leave the field under the RAW_File column
blank and instead enter the MS file names in frac_smry.xlsx.
The set of RAW_File names in metadata needs to be identifiable in PSM
data. Impalpable mismatches might occur when OS file names were
altered by MS users and thus different to those recorded internally in MS
data for parsing by search engine(s). In the case, machine-generated MS file
names should be used. In addition, MS files may occasionally have no
contributions to PSM findings. In the case, users will be prompted to remove
these MS file names.
Utilities extract_raws and extract_psm_raws may
aid matching MS file names between metadata and PSM data. Utility
extract_raws extracts the names of MS files under a file
folder. Utility extract_psm_raws extracts the names of MS
files that are available in PSM data.
Reference: reference entry(entries) are indicated with non-void
string(s).
| Optional default column | Descrption |
| Select | Samples to be selected for indicated analysis |
| Group | Aesthetic labels annotating the prior knowledge of sample groups, e.g., Ctrl_T1, Ctrl_T2, Disease_T1, Disease_T2, ... |
| Order | Numeric labels
specifying the order of sample groups |
| Fill | Aesthetic labels for sample annotation by filled color |
| Color | Aesthetic labels for sample annotation by edge color |
| Shape | Aesthetic labels for sample annotation by shape |
| Size | Aesthetic labels for sample annotation by size |
| Alpha | Aesthetic labels for sample annotation by transparency |
| Exemplary optional open column | Descrption |
| Term | Categorical terms for statistical modeling. |
| Peptide_Yield | Yields of peptides in sample handling |
frac_smry.xlsx| Column | Descrption |
| Sample_ID | Unique sample IDs (only required with LFQ) |
| TMT_Set | TMT experiment indexes (auto-filled for LFQ) |
| LCMS_Injection | LC/MS injection indexes |
| Fraction | Fraction
indexes under a TMT_Set |
| RAW_File | MS data file names |
| PSM_File | Names of PSM files. Required only when one RAW_File can
be linked to multiple PSM files (e.g. F012345.csv and F012346.csv both from
ms_1.raw). |
Data normalization
normPSM for extended
examples in PSM data normalization
PSM2Pep for extended
examples in PSM to peptide summarization
mergePep for
extended examples in peptide data merging
standPep for
extended examples in peptide data normalization
Pep2Prn
for extended examples in peptide to protein summarization
standPrn for extended examples in protein data normalization.
purgePSM and purgePep for extended examples
in data purging
pepHist and prnHist for
extended examples in histogram visualization.
extract_raws
and extract_psm_raws for extracting MS file names
User-friendly utilities for variable arguments of 'filter_...'
contain_str, contain_chars_in,
not_contain_str, not_contain_chars_in,
start_with_str, end_with_str,
start_with_chars_in and ends_with_chars_in for
data subsetting by character strings
Missing values
pepImp and
prnImp for missing value imputation
Informatics
pepSig and prnSig
for significance tests
pepVol and prnVol for
volcano plot visualization
prnGSPA for gene set enrichment
analysis by protein significance pVals
gspaMap for mapping
GSPA to volcano plot visualization
prnGSPAHM for heat map
and network visualization of GSPA results
prnGSVA for gene
set variance analysis
prnGSEA for data preparation for
online GSEA.
pepMDS and prnMDS for MDS
visualization
pepPCA and prnPCA for PCA
visualization
pepLDA and prnLDA for LDA
visualization
pepHM and prnHM for heat map
visualization
pepCorr_logFC, prnCorr_logFC,
pepCorr_logInt and prnCorr_logInt for
correlation plots
anal_prnTrend and
plot_prnTrend for trend analysis and visualization
anal_pepNMF, anal_prnNMF,
plot_pepNMFCon, plot_prnNMFCon,
plot_pepNMFCoef, plot_prnNMFCoef and
plot_metaNMF for NMF analysis and visualization
Custom databases
Uni2Entrez for lookups between
UniProt accessions and Entrez IDs
Ref2Entrez for lookups
among RefSeq accessions, gene names and Entrez IDs
prepGO
for
gene
ontology
prepMSig for
molecular
signatures
prepString and anal_prnString
for STRING-DB
Workflow scripts
# TMT
system.file("extdata",
"workflow_tmt_base.R", package = "proteoQ")
system.file("extdata",
"workflow_tmt_ext.R", package = "proteoQ")
# LFQ
system.file("extdata", "workflow_lfq_base.R", package = "proteoQ")
Metadata files
# TMT, no fractionation — OK without
'frac_smry.xlsx'
# (a. no references)
system.file("extdata",
"expt_smry_no_prefrac.xlsx", package = "proteoQDA")
# (b. W2 and W16
references)
system.file("extdata",
"expt_smry_no_prefrac_ref_w2_w16.xlsx", package = "proteoQDA")
# TMT, prefractionation
# (a. no references)
system.file("extdata",
"expt_smry_gtmt.xlsx", package = "proteoQDA")
system.file("extdata",
"frac_smry_gtmt.xlsx", package = "proteoQDA")
# (b. W2 references)
system.file("extdata", "expt_smry_ref_w2.xlsx",
package = "proteoQDA")
system.file("extdata", "frac_smry_gtmt.xlsx",
package = "proteoQDA")
# (c. W2 and W16 references)
system.file("extdata",
"expt_smry_ref_w2_w16.xlsx", package = "proteoQDA")
system.file("extdata", "frac_smry_gtmt.xlsx", package = "proteoQDA")
# TMT, prefractionation (global + phospho)
system.file("extdata",
"expt_smry_tmt_cmbn.xlsx", package = "proteoQDA")
system.file("extdata",
"frac_smry_tmt_cmbn.xlsx", package = "proteoQDA")
# TMT, prefractionation, one MS to multiple PSM files
system.file("extdata", "expt_smry_psmfiles.xlsx", package = "proteoQDA")
system.file("extdata", "frac_smry_psmfiles.xlsx", package = "proteoQDA")
# TMT, prefractionation, mixed-plexes
# (column PSM_File needed; as with
this example,
# mixed-plexes results are actually from the same MS
files
# but searched separately at 6- and 10-plex settings!)
system.file("extdata", "expt_smry_mixplexes.xlsx", package = "proteoQDA")
system.file("extdata", "frac_smry_mixplexes.xlsx", package =
"proteoQDA")
# LFQ, prefractionation
system.file("extdata", "expt_smry_plfq.xlsx",
package = "proteoQDA")
system.file("extdata", "frac_smry_plfq.xlsx",
package = "proteoQDA")
Column keys in PSM, peptide and protein outputs
system.file("extdata", "psm_keys.txt", package = "proteoQ")
system.file("extdata", "peptide_keys.txt", package = "proteoQ")
system.file("extdata", "protein_keys.txt", package = "proteoQ")
MS1 peptide masses
calc_pepmasses
for mono-isotopic masses of peptides from fasta databases
calc_monopeptide for mono-isotopic masses of peptides
from individual sequences
parse_unimod for
parsing Unimod fixed modifications, variable
modifications and neutral losses.
find_unimod for
finding a Unimod
# ***********************************
# ************ TMT ************
# ***********************************
# ===================================
# Fasta and PSM files
# ===================================
# fasta (all platforms)
library(proteoQDA)
fasta_dir <- "~/proteoQ/dbs/fasta/refseq"
copy_refseq_hs(fasta_dir)
copy_refseq_mm(fasta_dir)
# working directory (all platforms)
dat_dir <- "~/proteoQ/examples"
# metadata (all platforms)
copy_exptsmry_gtmt(dat_dir)
copy_fracsmry_gtmt(dat_dir)
# PSM (choose one of the platforms)
choose_one <- TRUE
if (!choose_one) {
## Mascot
copy_mascot_gtmt(dat_dir)
## or MaxQuant
# copy_maxquant_gtmt(dat_dir)
## or MSFragger
# copy_msfragger_gtmt(dat_dir)
## or proteoM
# copy_proteom_gtmt(dat_dir)
## or Spectrum Mill
# (temporarily unavailable)
}
# ===================================
# PSM, peptide and protein processing
# ===================================
library(proteoQ)
load_expts("~/proteoQ/examples")
# PSM data standardization
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = gene,
annot_kinases = TRUE,
# no default and required
fasta = c("~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
"~/proteoQ/dbs/fasta/refseq/refseq_mm_2013_07.fasta"),
)
# optional PSM purging
purgePSM()
# PSMs to peptides
PSM2Pep()
# peptide data merging
mergePep()
# peptide data standardization
standPep()
# peptide data histograms
pepHist()
# optional peptide purging
purgePep()
# peptides to proteins
Pep2Prn(use_unique_pep = TRUE)
# protein data standardization
standPrn()
# protein data histograms
prnHist()
# ===================================
# Optional significance tests
# (no NA imputation)
# ===================================
pepSig(
W2_bat = ~ Term["W2.BI.TMT2-W2.BI.TMT1",
"W2.JHU.TMT2-W2.JHU.TMT1",
"W2.PNNL.TMT2-W2.PNNL.TMT1"],
W2_loc = ~ Term_2["W2.BI-W2.JHU",
"W2.BI-W2.PNNL",
"W2.JHU-W2.PNNL"],
W16_vs_W2 = ~ Term_3["W16-W2"],
)
prnSig()
# ===================================
# optional NA imputation
# ===================================
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)
# ===================================
# Optional significance tests
# (with NA imputation)
# ===================================
pepSig(
impute_na = TRUE,
W2_bat = ~ Term["W2.BI.TMT2-W2.BI.TMT1",
"W2.JHU.TMT2-W2.JHU.TMT1",
"W2.PNNL.TMT2-W2.PNNL.TMT1"],
W2_loc = ~ Term_2["W2.BI-W2.JHU",
"W2.BI-W2.PNNL",
"W2.JHU-W2.PNNL"],
W16_vs_W2 = ~ Term_3["W16-W2"],
)
prnSig(impute_na = TRUE)
# ***********************************
# ************ LFQ ************
# ***********************************
# ===================================
# Fasta and PSM files
# ===================================
# fasta (all platforms)
library(proteoQDA)
fasta_dir <- "~/proteoQ/dbs/fasta/uniprot"
copy_uniprot_hsmm(fasta_dir)
# working directory (all platforms)
dat_dir <- "~/proteoQ/examples"
# metadata (all platforms)
copy_exptsmry_plfq(dat_dir)
copy_fracsmry_plfq(dat_dir)
# PSM (choose one of the platforms)
choose_one <- TRUE
if (!choose_one) {
## Mascot
copy_mascot_plfq(dat_dir)
## or MaxQuant
# copy_maxquant_plfq(dat_dir)
## or MSFragger
# copy_msfragger_plfq(dat_dir)
## or proteoM
# copy_proteom_plfq(dat_dir)
## or Spectrum Mill
# (temporarily unavailable)
}
# ===================================
# PSM, peptide and protein processing
# ===================================
library(proteoQ)
load_expts("~/proteoQ/examples")
# PSM data standardization
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = gene,
annot_kinases = TRUE,
fasta = c("~/proteoQ/dbs/fasta/uniprot/uniprot_hsmm_2020_03.fasta"),
)
# PSM purging not applicable with LFQ
# purgePSM()
# PSMs to peptides
PSM2Pep()
# peptide data merging
mergePep()
# peptide data standardization
standPep()
# peptide data histograms
pepHist()
# optional peptide purging
purgePep()
# peptides to proteins
Pep2Prn(use_unique_pep = TRUE)
# protein data standardization
standPrn()
# protein data histograms
prnHist()
# ===================================
# Optional significance tests
# (no NA imputation)
# ===================================
pepSig(
fml_1 = ~ Term["BI-JHU",
"JHU-PNNL",
"(BI+JHU)/2-PNNL"],
)
prnSig()
# ===================================
# optional NA imputation
# ===================================
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)
# ===================================
# Optional significance tests
# (with NA imputation)
# ===================================
pepSig(
impute_na = TRUE,
fml_1 = ~ Term["BI-JHU",
"JHU-PNNL",
"(BI+JHU)/2-PNNL"],
)
prnSig(impute_na = TRUE)
# ***********************************
# *********** SILAC ***********
# ***********************************
# Database searches
library(proteoM)
matchMS(
silac_mix = list(base = NULL, heavy = c("K8 (K)", "R10 (R)")),
...
)
# The remaining is the same as LFQ
# ...
## Not run:
load_expts(dat_dir = "~/proteoQ/examples", expt_smry = "expt_smry.xlsx")
# not working; `expt_smry = my_expt` is an expression
my_expt <- "expt_smry.xlsx"
load_expts(dat_dir = "~/proteoQ/examples", expt_smry = my_expt)
# need unquoting;
# see also: https://dplyr.tidyverse.org/articles/programming.html
load_expts(dat_dir = "~/proteoQ/examples", expt_smry = !!my_expt)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.