load_expts | R Documentation |
load_expts
processes .xlsx
or .csv
files containing the
metadata of TMT or LFQ experiments. For simplicity, .xlsx
will be
assumed in the document.
load_expts(
dat_dir = NULL,
expt_smry = "expt_smry.xlsx",
frac_smry = "frac_smry.xlsx"
)
dat_dir |
A character string to the working directory. The default is to match the value under the global environment. |
expt_smry |
A character string to a |
frac_smry |
A character string to a |
expt_smry.xlsx
The expt_smry.xlsx
files should be
placed immediately under the file folder defined by dat_dir
. The tab
containing the metadata of TMT or LFQ experiments should be named
Setup
. The Excel
spread sheet therein is comprised of three
tiers of fields: (1) essential, (2) optional default and (3) optional open.
The essential
columns contain the mandatory information of the
experiments. The optional default
columns serve as the fields for
default lookups in sample selection, grouping, ordering, aesthetics, etc.
The optional open
fields allow users to define their own analysis,
aesthetics, etc.
Essential column | Descrption |
Sample_ID | Unique sample IDs |
TMT_Channel | TMT channel names:
126 , 127N , 127C etc. (left void for LFQ) |
TMT_Set | TMT experiment indexes 1, 2, 3, ... (auto-filled for LFQ) |
LCMS_Injection | LC/MS injection indexes 1, 2, 3, ... under a
TMT_Set |
RAW_File | MS data file names originated by MS
software(s) |
Reference | Labels indicating reference samples in TMT or LFQ experiments |
Sample_ID
: values should be unique for entries at a unique
combination of TMT_Channel
and TMT_Set
, or voided for unused
entries. Samples with the same indexes of TMT_Channel
and
TMT_Set
but different indexes of LCMS_Injection
should have
the same value in Sample_ID
. No white space or special characters are
allowed. See also posts for
sample
exclusion.
RAW_File
: (a) for analysis with off-line fractionation of peptides
before LC/MS, values under the RAW_File
column should be left void.
Instead, the correspondence between the fraction numbers and RAW_File
names should be specified in a separate file, for example,
frac_smry.xlsx
. (2) For analysis without off-line fractionation, it
is recommended as well to leave the field under the RAW_File
column
blank and instead enter the MS file names in frac_smry.xlsx
.
The set of RAW_File
names in metadata needs to be identifiable in PSM
data. Impalpable mismatches might occur when OS
file names were
altered by MS users and thus different to those recorded internally in MS
data for parsing by search engine(s). In the case, machine-generated MS file
names should be used. In addition, MS files may occasionally have no
contributions to PSM findings. In the case, users will be prompted to remove
these MS file names.
Utilities extract_raws
and extract_psm_raws
may
aid matching MS file names between metadata and PSM data. Utility
extract_raws
extracts the names of MS files under a file
folder. Utility extract_psm_raws
extracts the names of MS
files that are available in PSM data.
Reference
: reference entry(entries) are indicated with non-void
string(s).
Optional default column | Descrption |
Select | Samples to be selected for indicated analysis |
Group | Aesthetic labels annotating the prior knowledge of sample groups, e.g., Ctrl_T1, Ctrl_T2, Disease_T1, Disease_T2, ... |
Order | Numeric labels
specifying the order of sample groups |
Fill | Aesthetic labels for sample annotation by filled color |
Color | Aesthetic labels for sample annotation by edge color |
Shape | Aesthetic labels for sample annotation by shape |
Size | Aesthetic labels for sample annotation by size |
Alpha | Aesthetic labels for sample annotation by transparency |
Exemplary optional open column | Descrption |
Term | Categorical terms for statistical modeling. |
Peptide_Yield | Yields of peptides in sample handling |
frac_smry.xlsx
Column | Descrption |
Sample_ID | Unique sample IDs (only required with LFQ) |
TMT_Set | TMT experiment indexes (auto-filled for LFQ) |
LCMS_Injection | LC/MS injection indexes |
Fraction | Fraction
indexes under a TMT_Set |
RAW_File | MS data file names |
PSM_File | Names of PSM files. Required only when one RAW_File can
be linked to multiple PSM files (e.g. F012345.csv and F012346.csv both from
ms_1.raw). |
Data normalization
normPSM
for extended
examples in PSM data normalization
PSM2Pep
for extended
examples in PSM to peptide summarization
mergePep
for
extended examples in peptide data merging
standPep
for
extended examples in peptide data normalization
Pep2Prn
for extended examples in peptide to protein summarization
standPrn
for extended examples in protein data normalization.
purgePSM
and purgePep
for extended examples
in data purging
pepHist
and prnHist
for
extended examples in histogram visualization.
extract_raws
and extract_psm_raws
for extracting MS file names
User-friendly utilities for variable arguments of 'filter_...'
contain_str
, contain_chars_in
,
not_contain_str
, not_contain_chars_in
,
start_with_str
, end_with_str
,
start_with_chars_in
and ends_with_chars_in
for
data subsetting by character strings
Missing values
pepImp
and
prnImp
for missing value imputation
Informatics
pepSig
and prnSig
for significance tests
pepVol
and prnVol
for
volcano plot visualization
prnGSPA
for gene set enrichment
analysis by protein significance pVals
gspaMap
for mapping
GSPA to volcano plot visualization
prnGSPAHM
for heat map
and network visualization of GSPA results
prnGSVA
for gene
set variance analysis
prnGSEA
for data preparation for
online GSEA.
pepMDS
and prnMDS
for MDS
visualization
pepPCA
and prnPCA
for PCA
visualization
pepLDA
and prnLDA
for LDA
visualization
pepHM
and prnHM
for heat map
visualization
pepCorr_logFC
, prnCorr_logFC
,
pepCorr_logInt
and prnCorr_logInt
for
correlation plots
anal_prnTrend
and
plot_prnTrend
for trend analysis and visualization
anal_pepNMF
, anal_prnNMF
,
plot_pepNMFCon
, plot_prnNMFCon
,
plot_pepNMFCoef
, plot_prnNMFCoef
and
plot_metaNMF
for NMF analysis and visualization
Custom databases
Uni2Entrez
for lookups between
UniProt accessions and Entrez IDs
Ref2Entrez
for lookups
among RefSeq accessions, gene names and Entrez IDs
prepGO
for
gene
ontology
prepMSig
for
molecular
signatures
prepString
and anal_prnString
for STRING-DB
Workflow scripts
# TMT
system.file("extdata",
"workflow_tmt_base.R", package = "proteoQ")
system.file("extdata",
"workflow_tmt_ext.R", package = "proteoQ")
# LFQ
system.file("extdata", "workflow_lfq_base.R", package = "proteoQ")
Metadata files
# TMT, no fractionation — OK without
'frac_smry.xlsx'
# (a. no references)
system.file("extdata",
"expt_smry_no_prefrac.xlsx", package = "proteoQDA")
# (b. W2 and W16
references)
system.file("extdata",
"expt_smry_no_prefrac_ref_w2_w16.xlsx", package = "proteoQDA")
# TMT, prefractionation
# (a. no references)
system.file("extdata",
"expt_smry_gtmt.xlsx", package = "proteoQDA")
system.file("extdata",
"frac_smry_gtmt.xlsx", package = "proteoQDA")
# (b. W2 references)
system.file("extdata", "expt_smry_ref_w2.xlsx",
package = "proteoQDA")
system.file("extdata", "frac_smry_gtmt.xlsx",
package = "proteoQDA")
# (c. W2 and W16 references)
system.file("extdata",
"expt_smry_ref_w2_w16.xlsx", package = "proteoQDA")
system.file("extdata", "frac_smry_gtmt.xlsx", package = "proteoQDA")
# TMT, prefractionation (global + phospho)
system.file("extdata",
"expt_smry_tmt_cmbn.xlsx", package = "proteoQDA")
system.file("extdata",
"frac_smry_tmt_cmbn.xlsx", package = "proteoQDA")
# TMT, prefractionation, one MS to multiple PSM files
system.file("extdata", "expt_smry_psmfiles.xlsx", package = "proteoQDA")
system.file("extdata", "frac_smry_psmfiles.xlsx", package = "proteoQDA")
# TMT, prefractionation, mixed-plexes
# (column PSM_File needed; as with
this example,
# mixed-plexes results are actually from the same MS
files
# but searched separately at 6- and 10-plex settings!)
system.file("extdata", "expt_smry_mixplexes.xlsx", package = "proteoQDA")
system.file("extdata", "frac_smry_mixplexes.xlsx", package =
"proteoQDA")
# LFQ, prefractionation
system.file("extdata", "expt_smry_plfq.xlsx",
package = "proteoQDA")
system.file("extdata", "frac_smry_plfq.xlsx",
package = "proteoQDA")
Column keys in PSM, peptide and protein outputs
system.file("extdata", "psm_keys.txt", package = "proteoQ")
system.file("extdata", "peptide_keys.txt", package = "proteoQ")
system.file("extdata", "protein_keys.txt", package = "proteoQ")
MS1 peptide masses
calc_pepmasses
for mono-isotopic masses of peptides from fasta databases
calc_monopeptide
for mono-isotopic masses of peptides
from individual sequences
parse_unimod
for
parsing Unimod fixed modifications, variable
modifications and neutral losses.
find_unimod
for
finding a Unimod
# ***********************************
# ************ TMT ************
# ***********************************
# ===================================
# Fasta and PSM files
# ===================================
# fasta (all platforms)
library(proteoQDA)
fasta_dir <- "~/proteoQ/dbs/fasta/refseq"
copy_refseq_hs(fasta_dir)
copy_refseq_mm(fasta_dir)
# working directory (all platforms)
dat_dir <- "~/proteoQ/examples"
# metadata (all platforms)
copy_exptsmry_gtmt(dat_dir)
copy_fracsmry_gtmt(dat_dir)
# PSM (choose one of the platforms)
choose_one <- TRUE
if (!choose_one) {
## Mascot
copy_mascot_gtmt(dat_dir)
## or MaxQuant
# copy_maxquant_gtmt(dat_dir)
## or MSFragger
# copy_msfragger_gtmt(dat_dir)
## or proteoM
# copy_proteom_gtmt(dat_dir)
## or Spectrum Mill
# (temporarily unavailable)
}
# ===================================
# PSM, peptide and protein processing
# ===================================
library(proteoQ)
load_expts("~/proteoQ/examples")
# PSM data standardization
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = gene,
annot_kinases = TRUE,
# no default and required
fasta = c("~/proteoQ/dbs/fasta/refseq/refseq_hs_2013_07.fasta",
"~/proteoQ/dbs/fasta/refseq/refseq_mm_2013_07.fasta"),
)
# optional PSM purging
purgePSM()
# PSMs to peptides
PSM2Pep()
# peptide data merging
mergePep()
# peptide data standardization
standPep()
# peptide data histograms
pepHist()
# optional peptide purging
purgePep()
# peptides to proteins
Pep2Prn(use_unique_pep = TRUE)
# protein data standardization
standPrn()
# protein data histograms
prnHist()
# ===================================
# Optional significance tests
# (no NA imputation)
# ===================================
pepSig(
W2_bat = ~ Term["W2.BI.TMT2-W2.BI.TMT1",
"W2.JHU.TMT2-W2.JHU.TMT1",
"W2.PNNL.TMT2-W2.PNNL.TMT1"],
W2_loc = ~ Term_2["W2.BI-W2.JHU",
"W2.BI-W2.PNNL",
"W2.JHU-W2.PNNL"],
W16_vs_W2 = ~ Term_3["W16-W2"],
)
prnSig()
# ===================================
# optional NA imputation
# ===================================
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)
# ===================================
# Optional significance tests
# (with NA imputation)
# ===================================
pepSig(
impute_na = TRUE,
W2_bat = ~ Term["W2.BI.TMT2-W2.BI.TMT1",
"W2.JHU.TMT2-W2.JHU.TMT1",
"W2.PNNL.TMT2-W2.PNNL.TMT1"],
W2_loc = ~ Term_2["W2.BI-W2.JHU",
"W2.BI-W2.PNNL",
"W2.JHU-W2.PNNL"],
W16_vs_W2 = ~ Term_3["W16-W2"],
)
prnSig(impute_na = TRUE)
# ***********************************
# ************ LFQ ************
# ***********************************
# ===================================
# Fasta and PSM files
# ===================================
# fasta (all platforms)
library(proteoQDA)
fasta_dir <- "~/proteoQ/dbs/fasta/uniprot"
copy_uniprot_hsmm(fasta_dir)
# working directory (all platforms)
dat_dir <- "~/proteoQ/examples"
# metadata (all platforms)
copy_exptsmry_plfq(dat_dir)
copy_fracsmry_plfq(dat_dir)
# PSM (choose one of the platforms)
choose_one <- TRUE
if (!choose_one) {
## Mascot
copy_mascot_plfq(dat_dir)
## or MaxQuant
# copy_maxquant_plfq(dat_dir)
## or MSFragger
# copy_msfragger_plfq(dat_dir)
## or proteoM
# copy_proteom_plfq(dat_dir)
## or Spectrum Mill
# (temporarily unavailable)
}
# ===================================
# PSM, peptide and protein processing
# ===================================
library(proteoQ)
load_expts("~/proteoQ/examples")
# PSM data standardization
normPSM(
group_psm_by = pep_seq_mod,
group_pep_by = gene,
annot_kinases = TRUE,
fasta = c("~/proteoQ/dbs/fasta/uniprot/uniprot_hsmm_2020_03.fasta"),
)
# PSM purging not applicable with LFQ
# purgePSM()
# PSMs to peptides
PSM2Pep()
# peptide data merging
mergePep()
# peptide data standardization
standPep()
# peptide data histograms
pepHist()
# optional peptide purging
purgePep()
# peptides to proteins
Pep2Prn(use_unique_pep = TRUE)
# protein data standardization
standPrn()
# protein data histograms
prnHist()
# ===================================
# Optional significance tests
# (no NA imputation)
# ===================================
pepSig(
fml_1 = ~ Term["BI-JHU",
"JHU-PNNL",
"(BI+JHU)/2-PNNL"],
)
prnSig()
# ===================================
# optional NA imputation
# ===================================
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)
# ===================================
# Optional significance tests
# (with NA imputation)
# ===================================
pepSig(
impute_na = TRUE,
fml_1 = ~ Term["BI-JHU",
"JHU-PNNL",
"(BI+JHU)/2-PNNL"],
)
prnSig(impute_na = TRUE)
# ***********************************
# *********** SILAC ***********
# ***********************************
# Database searches
library(proteoM)
matchMS(
silac_mix = list(base = NULL, heavy = c("K8 (K)", "R10 (R)")),
...
)
# The remaining is the same as LFQ
# ...
## Not run:
load_expts(dat_dir = "~/proteoQ/examples", expt_smry = "expt_smry.xlsx")
# not working; `expt_smry = my_expt` is an expression
my_expt <- "expt_smry.xlsx"
load_expts(dat_dir = "~/proteoQ/examples", expt_smry = my_expt)
# need unquoting;
# see also: https://dplyr.tidyverse.org/articles/programming.html
load_expts(dat_dir = "~/proteoQ/examples", expt_smry = !!my_expt)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.