pepGSPA | R Documentation |
pepGSPA
performs the analysis of Gene Set Probability Asymmetricity
(GSPA) against peptide log2FC
data.
prnGSPA
performs the analysis of Gene Set Probability Asymmetricity
(GSPA) against protein log2FC
data.
pepGSPA(
gset_nms = c("go_sets", "c2_msig", "kinsub"),
method = "mean",
scale_log2r = TRUE,
complete_cases = FALSE,
impute_na = FALSE,
pval_cutoff = 0.05,
logFC_cutoff = log2(1.2),
gspval_cutoff = 0.05,
gslogFC_cutoff = log2(1.2),
min_size = 10,
max_size = Inf,
min_delta = 4,
min_greedy_size = 1,
use_adjP = FALSE,
fml_nms = NULL,
df = NULL,
filepath = NULL,
filename = NULL,
...
)
prnGSPA(
gset_nms = c("go_sets", "c2_msig", "kinsub"),
method = "mean",
scale_log2r = TRUE,
complete_cases = FALSE,
impute_na = FALSE,
pval_cutoff = 0.05,
logFC_cutoff = log2(1.2),
gspval_cutoff = 0.05,
gslogFC_cutoff = log2(1.2),
min_size = 10,
max_size = Inf,
min_delta = 4,
min_greedy_size = 1,
use_adjP = FALSE,
fml_nms = NULL,
df = NULL,
filepath = NULL,
filename = NULL,
...
)
gset_nms |
Character string or vector containing the shorthanded name(s),
full file path(s), or both, to gene sets for enrichment analysis. For
species among |
method |
Character string or vector; the method to assess the p-values of
GSPA. The default is |
scale_log2r |
Logical; if TRUE, adjusts |
complete_cases |
Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE. |
impute_na |
Logical; if TRUE, data with the imputation of missing values will be used. The default is FALSE. |
pval_cutoff |
Numeric value or vector; the cut-off in protein
significance |
logFC_cutoff |
Numeric value or vector; the cut-off in protein
|
gspval_cutoff |
Numeric value or vector; the cut-off in gene-set
significance |
gslogFC_cutoff |
Numeric value or vector; the cut-off in gene-set
enrichment fold change. Only enrichment terms with absolute fold change
greater than the threshold will be reported. The default magnitude is
|
min_size |
Numeric value or vector; minimum number of protein entries for
consideration in gene set tests. The number is after data filtration by
|
max_size |
Numeric value or vector; maximum number of protein entries for
consideration in gene set tests. The number is after data filtration by
|
min_delta |
Numeric value or vector; the minimum count difference between
the up- and the down-expressed group of proteins for consideration in gene
set tests. For example at |
min_greedy_size |
Numeric value or vector; minimum number of unique
protein entries for a gene set to be considered essential. The default in
|
use_adjP |
Logical; if TRUE, use Benjamini-Hochberg pVals. The default is FALSE. |
fml_nms |
Character string or vector; the formula name(s). By default,
the formula(s) will match those used in |
df |
The name of a primary data file. By default, it will be determined
automatically after matching the types of data and analysis with an
|
filepath |
Use system default. |
filename |
A representative file name to outputs. By default, it will be determined automatically by the name of the current call. |
... |
|
The significance pVals
of individual proteins are first obtained from
prnSig
, followed by log10 transformation and separation into up-
or down-expressed groups for each gene set. At the default of method =
mean
, the geometric mean of pVals
, P
, are each calculated for
the groups of up or down regulated proteins, with a penalty-like term
-log10(P)=(\sum_{i=1}^{n}-log10(p)+m)/(n+m)
where n
and m
are the numbers of entries with p
values
\le
or >
a significance cut-off, respectively. The quotient of the
two P
values is then used to represent the significance of gene set
enrichment. The arguments pval_cutoff
and logFC_cutoff
are used
to discriminate low influence genes. Additional subsetting of data via the
vararg
approach of filter_
is feasible. At method =
limma
, moderated t-tests are performed against -log10(pVals)
between
the up and the down groups via eBayes
.
Metadata
load_expts
for metadata preparation and a reduced working example in data normalization
Data normalization
normPSM
for extended examples in PSM data normalization
PSM2Pep
for extended examples in PSM to peptide summarization
mergePep
for extended examples in peptide data merging
standPep
for extended examples in peptide data normalization
Pep2Prn
for extended examples in peptide to protein summarization
standPrn
for extended examples in protein data normalization.
purgePSM
and purgePep
for extended examples in data purging
pepHist
and prnHist
for extended examples in histogram visualization.
extract_raws
and extract_psm_raws
for extracting MS file names
Variable arguments of 'filter_...'
contain_str
, contain_chars_in
, not_contain_str
,
not_contain_chars_in
, start_with_str
,
end_with_str
, start_with_chars_in
and
ends_with_chars_in
for data subsetting by character strings
Missing values
pepImp
and prnImp
for missing value imputation
Informatics
pepSig
and prnSig
for significance tests
pepVol
and prnVol
for volcano plot visualization
prnGSPA
for gene set enrichment analysis by protein significance pVals
gspaMap
for mapping GSPA to volcano plot visualization
prnGSPAHM
for heat map and network visualization of GSPA results
prnGSVA
for gene set variance analysis
prnGSEA
for data preparation for online GSEA.
pepMDS
and prnMDS
for MDS visualization
pepPCA
and prnPCA
for PCA visualization
pepLDA
and prnLDA
for LDA visualization
pepHM
and prnHM
for heat map visualization
pepCorr_logFC
, prnCorr_logFC
, pepCorr_logInt
and
prnCorr_logInt
for correlation plots
anal_prnTrend
and plot_prnTrend
for trend analysis and visualization
anal_pepNMF
, anal_prnNMF
, plot_pepNMFCon
,
plot_prnNMFCon
, plot_pepNMFCoef
, plot_prnNMFCoef
and
plot_metaNMF
for NMF analysis and visualization
Custom databases
Uni2Entrez
for lookups between UniProt accessions and Entrez IDs
Ref2Entrez
for lookups among RefSeq accessions, gene names and Entrez IDs
prepGO
for gene
ontology
prepMSig
for molecular
signatures
prepString
and anal_prnString
for STRING-DB
Column keys in PSM, peptide and protein outputs
system.file("extdata", "psm_keys.txt", package = "proteoQ")
system.file("extdata", "peptide_keys.txt", package = "proteoQ")
system.file("extdata", "protein_keys.txt", package = "proteoQ")
# ===================================
# GSPA tests
# ===================================
## !!!require the brief working example in `?load_expts`
## global option
scale_log2r <- TRUE
## base
# prerequisites in significance tests (impute_na = FALSE)
# as `pVals` are required
pepSig(
impute_na = FALSE,
W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)",
"(W2.JHU.TMT2-W2.JHU.TMT1)",
"(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
W2_loc = ~ Term_2["W2.BI-W2.JHU",
"W2.BI-W2.PNNL",
"W2.JHU-W2.PNNL"],
W16_vs_W2 = ~ Term_3["W16-W2"],
)
prnSig(impute_na = FALSE)
# GSPA analysis
prnGSPA(
impute_na = FALSE,
pval_cutoff = 5E-2, # protein pVal threshold
logFC_cutoff = log2(1.2), # protein log2FC threshold
gspval_cutoff = 5E-2, # gene-set pVal threshold
gslogFC_cutoff = log2(1.2), # gene-set log2FC threshold
gset_nms = c("go_sets", "c2_msig"),
)
# GSPA visualization under volcano plots
# forced lines of `pval_cutoff` and `logFC_cutoff`
# from the corresponding `prnGSPA`;
# more examples of aesthetics in `?gspaMap`
gspaMap(
impute_na = FALSE,
topn_gsets = 50,
show_sig = pVal,
xco = 1.2, # optional x-axis cut-off lines
yco = 0.05, # a optional y-axis cut-off line
)
## row filtration
prnGSPA(
impute_na = FALSE,
gset_nms = c("go_sets", "c2_msig"),
filter_prots_by_npep = exprs(prot_n_pep >= 2),
filter_prots_by_species = exprs(species == "human"),
filename = row_filters.txt,
)
# by default, volcano-plot visualization for all GSPA result files,
# which are "Protein_GSPA_Z.txt" and "row_filters_Protein_GSPA_Z.txt"
# up to this point
# will match the primary `fliter_` varargs to
# those in the corresponding `prnGSPA(...)`
gspaMap(
impute_na = FALSE,
topn_labels = 0,
topn_gsets = 50,
show_sig = pVal,
)
# or manually supply specific secondary file(s) with `df2`;
gspaMap(
df2 = "row_filters_Protein_GSPA_Z.txt",
impute_na = FALSE,
topn_gsets = 10,
show_sig = pVal,
)
# may apply additionally secondary `fliter2_` varargs against `df2`
gspaMap(
df2 = "row_filters_Protein_GSPA_Z.txt",
filter2_ess_size = exprs(ess_size >= 1),
impute_na = FALSE,
topn_gsets = 10,
topn_labels = 0,
show_sig = pVal,
)
# can also visualize results for specific formula(e) and/or
# specific `df2`
gspaMap(
fml_nms = "W16_vs_W2",
df2 = "row_filters_Protein_GSPA_Z.txt",
filter2_ess_size = exprs(ess_size >= 1),
impute_na = FALSE,
topn_gsets = 20,
show_sig = pVal,
)
## customized thresholds
# (formula names defined a priori in `pepSig(...)`)
# (higher thresholds for "W16_vs_W2"; also set `max_size` to
# avoid trapping with large terms, such as "cell parts"...)
prnGSPA(
impute_na = FALSE,
fml_nms = c("W2_bat", "W2_loc", "W16_vs_W2"),
pval_cutoff = c(5E-2, 5E-2, 1E-6),
logFC_cutoff = c(log2(1.2)),
gslogFC_cutoff = c(log2(1.2)),
gspval_cutoff = c(5E-2, 5E-2, 1E-4),
max_size = c(Inf, Inf, 120),
gset_nms = c("go_sets"),
filter_by_npep = exprs(prot_n_pep >= 2),
filter_prots_by_species = exprs(species == "human"),
filename = thresholds.txt,
)
gspaMap(
df2 = "thresholds_Protein_GSPA_Z.txt",
impute_na = FALSE,
topn_gsets = 10,
topn_labels = 0,
show_sig = pVal,
)
# "limma" for "W16_vs_W2"
prnGSPA(
impute_na = FALSE,
fml_nms = c("W2_bat", "W2_loc", "W16_vs_W2"),
method = c("mean", "mean", "limma"),
max_size = c(Inf, Inf, 120),
pval_cutoff = c(5E-2),
logFC_cutoff = c(log2(1.2)),
gspval_cutoff = c(5E-2),
gset_nms = c("go_sets", "c2_msig"),
filter_by = exprs(prot_n_pep >= 2, species == "human"),
filename = methods.txt,
)
gspaMap(
df2 = "methods_Protein_GSPA_Z.txt",
impute_na = FALSE,
topn_gsets = 25,
topn_labels = 0,
show_sig = pVal,
)
## NA imputation
# if not yet, run prerequisitive NA imputation
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)
# if not yet, run prerequisitive significance tests at `impute_na = TRUE`
pepSig(
impute_na = TRUE,
W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)",
"(W2.JHU.TMT2-W2.JHU.TMT1)",
"(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
W2_loc = ~ Term_2["W2.BI-W2.JHU",
"W2.BI-W2.PNNL",
"W2.JHU-W2.PNNL"],
W16_vs_W2 = ~ Term_3["W16-W2"],
)
prnSig(impute_na = TRUE)
prnGSPA(
impute_na = TRUE,
gset_nms = c("go_sets"),
filter_prots_by_npep = exprs(prot_n_pep >= 2),
)
# ===================================
# Distance heat maps of gene sets
# (also interactive networks)
# ===================================
# a `term` is a subset of an `ess_term` if the distance is zero
prnGSPAHM(
filter2_by = exprs(distance <= .7),
annot_cols = "ess_idx",
annot_colnames = "Eset index",
annot_rows = "ess_size",
filename = show_some_redundancy.png,
)
# human terms only
prnGSPAHM(
filter2_num = exprs(distance <= .95),
filter2_sp = exprs(start_with_str("hs", term)),
annot_cols = "ess_idx",
annot_colnames = "Eset index",
filename = show_more_connectivity.png,
)
# custom color palette
prnGSPAHM(
annot_cols = c("ess_idx", "ess_size"),
annot_colnames = c("Eset index", "Size"),
filter2_by = exprs(distance <= .95),
color = colorRampPalette(c("blue", "white", "red"))(100),
filename = custom_colors.png,
)
# can also visualize results for specific `df2` and formula(e)
prnGSPAHM(
df2 = "thresholds_Protein_GSPA_Z_essmap.txt",
fml_nms = "W16_vs_W2",
annot_cols = c("ess_idx", "ess_size"),
annot_colnames = c("Eset index", "Size"),
filter2_by = exprs(distance <= .95),
filename = w16_vs_w2.png,
)
# ===================================
# custom GO databases
# ===================================
# start over
unlink(file.path(dat_dir, "Protein/GSPA"), recursive = TRUE, force = TRUE)
# prepare GO databases (see also ?prepGO)
prepGO(
species = human,
db_path = "~/proteoQ/dbs/go",
gaf_url = "http://current.geneontology.org/annotations/goa_human.gaf.gz",
obo_url = "http://purl.obolibrary.org/obo/go/go-basic.obo",
filename = go_hs.rds,
)
prepGO(
species = mouse,
db_path = "~/proteoQ/dbs/go",
gaf_url = "http://current.geneontology.org/annotations/mgi.gaf.gz",
obo_url = "http://purl.obolibrary.org/obo/go/go-basic.obo",
filename = go_mm.rds,
)
head(readRDS(file.path("~/proteoQ/dbs/go", "go_hs.rds")))
head(readRDS(file.path("~/proteoQ/dbs/go", "go_mm.rds")))
# analysis
prnGSPA(
impute_na = FALSE,
gset_nms = c("~/proteoQ/dbs/go/go_hs.rds",
"~/proteoQ/dbs/go/go_mm.rds"),
)
# It is possible that the system "go_sets" may be partially different to the
# custom GO. Be explicit on `gset_nms`; otherwise, the default of `gset_nms =
# c("go_sets", "c2_msig")` will be applied.
gspaMap(
gset_nms = c("~/proteoQ/dbs/go/go_hs.rds",
"~/proteoQ/dbs/go/go_mm.rds"),
impute_na = FALSE,
topn_gsets = 20,
show_sig = pVal,
)
# ===================================
# custom MSig databases
# ===================================
# start over
unlink(file.path(dat_dir, "Protein/GSPA"), recursive = TRUE, force = TRUE)
# prepare MSig databases (see also ?prepMSig)
prepMSig(species = human)
prepMSig(species = mouse)
head(readRDS(file.path("~/proteoQ/dbs/msig", "msig_hs.rds")))
head(readRDS(file.path("~/proteoQ/dbs/msig", "msig_mm.rds")))
# analysis
prnGSPA(
impute_na = FALSE,
gset_nms = c("~/proteoQ/dbs/msig/msig_hs.rds",
"~/proteoQ/dbs/msig/msig_mm.rds"),
)
# It is possible that the system "c2_msig" may be partially different to the
# custom databases. Be explicit on `gset_nms`; otherwise, the default of
# `gset_nms = c("go_sets", "c2_msig")` will be applied.
gspaMap(
gset_nms = c("~/proteoQ/dbs/msig/msig_hs.rds",
"~/proteoQ/dbs/msig/msig_mm.rds"),
impute_na = FALSE,
topn_gsets = 20,
show_sig = pVal,
)
## put custom `prepGO`, `prepMSig` preparation into `prnGSPA` analysis
# start over with fresh database preparation
unlink(file.path("~/proteoQ/dbs/go"), recursive = TRUE, force = TRUE)
unlink(file.path("~/proteoQ/dbs/msig"), recursive = TRUE, force = TRUE)
library(magrittr)
prnGSPA(
impute_na = FALSE,
pval_cutoff = 5E-2,
logFC_cutoff = log2(1.2),
gspval_cutoff = 5E-2,
gset_nms = c(prepGO(human), prepGO(mouse), prepMSig(human), prepMSig(mouse)),
filename = cmbn.txt,
) %>%
gspaMap(
# gset_nms = .,
df2 = "cmbn_Protein_GSPA_Z.txt",
impute_na = FALSE,
topn_gsets = 20,
topn_labels = 0,
show_sig = pVal,
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.