prnGSPA: GSPA of peptide data
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

pepGSPA

R Documentation

GSPA of peptide data

Description

pepGSPA performs the analysis of Gene Set Probability Asymmetricity (GSPA) against peptide log2FC data.

prnGSPA performs the analysis of Gene Set Probability Asymmetricity (GSPA) against protein log2FC data.

Usage

pepGSPA(
  gset_nms = c("go_sets", "c2_msig", "kinsub"),
  method = "mean",
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  pval_cutoff = 0.05,
  logFC_cutoff = log2(1.2),
  gspval_cutoff = 0.05,
  gslogFC_cutoff = log2(1.2),
  min_size = 10,
  max_size = Inf,
  min_delta = 4,
  min_greedy_size = 1,
  use_adjP = FALSE,
  fml_nms = NULL,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  ...
)

prnGSPA(
  gset_nms = c("go_sets", "c2_msig", "kinsub"),
  method = "mean",
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  pval_cutoff = 0.05,
  logFC_cutoff = log2(1.2),
  gspval_cutoff = 0.05,
  gslogFC_cutoff = log2(1.2),
  min_size = 10,
  max_size = Inf,
  min_delta = 4,
  min_greedy_size = 1,
  use_adjP = FALSE,
  fml_nms = NULL,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  ...
)

Arguments

`gset_nms`	Character string or vector containing the shorthanded name(s), full file path(s), or both, to gene sets for enrichment analysis. For species among `"human", "mouse", "rat"`, the default of `c("go_sets", "c2_msig", "kinsub")` will utilize terms from gene ontology (`GO`), molecular signatures (`MSig`) and kinase-substrate network (`PSP Kinase-Substrate`). Custom `GO`, `MSig` and other data bases at given species are also supported. See also: `prepGO` for the preparation of custom `GO`; `prepMSig` for the preparation of custom `MSig`. For other custom data bases, follow the same format of list as `GO` or `MSig`.
`method`	Character string or vector; the method to assess the p-values of GSPA. The default is `mean` and the alternative is `limma`. See also section `Details` for the calculations.
`scale_log2r`	Logical; if TRUE, adjusts `log2FC` to the same scale of standard deviation across all samples. The default is TRUE. At `scale_log2r = NA`, the raw `log2FC` without normalization will be used.
`complete_cases`	Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.
`impute_na`	Logical; if TRUE, data with the imputation of missing values will be used. The default is FALSE.
`pval_cutoff`	Numeric value or vector; the cut-off in protein significance `pVal`. Entries with `pVals` less significant than the threshold will be excluded from enrichment analysis. The default is 0.05 for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of cut-off values.
`logFC_cutoff`	Numeric value or vector; the cut-off in protein `log2FC`. Entries with absolute `log2FC` smaller than the threshold will be excluded from enrichment analysis. The default magnitude is `log2(1.2)` for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of absolute values in `log2FC`.
`gspval_cutoff`	Numeric value or vector; the cut-off in gene-set significance `pVal`. Only enrichment terms with `pVals` more significant than the threshold will be reported. The default is 0.05 for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of cut-off values.
`gslogFC_cutoff`	Numeric value or vector; the cut-off in gene-set enrichment fold change. Only enrichment terms with absolute fold change greater than the threshold will be reported. The default magnitude is `log2(1.2)` for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of absolute values in `log2FC`.
`min_size`	Numeric value or vector; minimum number of protein entries for consideration in gene set tests. The number is after data filtration by `pval_cutoff`, `logFC_cutoff` or varargs expressions under `filter_`. The default is 10 for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of sizes.
`max_size`	Numeric value or vector; maximum number of protein entries for consideration in gene set tests. The number is after data filtration by `pval_cutoff`, `logFC_cutoff` or varargs expressions under `filter_`. The default in infinite for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of sizes.
`min_delta`	Numeric value or vector; the minimum count difference between the up- and the down-expressed group of proteins for consideration in gene set tests. For example at `min_delta = 4`, a gene set will 6 upregulated proteins and 2 down-expressed proteins, or vice versa, will be assessed. The number is after data filtration by `pval_cutoff`, `logFC_cutoff` or varargs expressions under `filter_`. The default is 4 for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of sizes.
`min_greedy_size`	Numeric value or vector; minimum number of unique protein entries for a gene set to be considered essential. The default in `1` for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of sizes.
`use_adjP`	Logical; if TRUE, use Benjamini-Hochberg pVals. The default is FALSE.
`fml_nms`	Character string or vector; the formula name(s). By default, the formula(s) will match those used in `pepSig` or `prnSig`.
`df`	The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an `id` among `c("pep_seq", "pep_seq_mod", "prot_acc", "gene")`. A primary file contains normalized peptide or protein data and is among `c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`. For analyses require the fields of significance p-values, the `df` will be one of `c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`.
`filepath`	Use system default.
`filename`	A representative file name to outputs. By default, it will be determined automatically by the name of the current call.
`...`	`filter_`: Logical expression(s) for the row filtration against data in a primary file of `/Model/Protein[_impNA]_pVals.txt`. See also `normPSM` for the format of `filter_` statements. `arrange_`: Variable argument statements for the row ordering against data in a primary file of `/Model/Protein[_impNA]_pVals.txt`. See also `prnHM` for the format of `arrange_` statements.

Details

The significance pVals of individual proteins are first obtained from prnSig, followed by log10 transformation and separation into up- or down-expressed groups for each gene set. At the default of method = mean, the geometric mean of pVals, P, are each calculated for the groups of up or down regulated proteins, with a penalty-like term

-log10(P)=(\sum_{i=1}^{n}-log10(p)+m)/(n+m)

where n and m are the numbers of entries with p values \le or > a significance cut-off, respectively. The quotient of the two P values is then used to represent the significance of gene set enrichment. The arguments pval_cutoff and logFC_cutoff are used to discriminate low influence genes. Additional subsetting of data via the vararg approach of filter_ is feasible. At method = limma, moderated t-tests are performed against -log10(pVals) between the up and the down groups via eBayes.

Examples


# ===================================
# GSPA tests
# ===================================

## !!!require the brief working example in `?load_expts`

## global option
scale_log2r <- TRUE

## base
# prerequisites in significance tests (impute_na = FALSE)
# as `pVals` are required
pepSig(
  impute_na = FALSE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = FALSE)

# GSPA analysis
prnGSPA(
  impute_na = FALSE,
  pval_cutoff = 5E-2, # protein pVal threshold
  logFC_cutoff = log2(1.2), # protein log2FC threshold
  gspval_cutoff = 5E-2, # gene-set pVal threshold
  gslogFC_cutoff = log2(1.2), # gene-set log2FC threshold
  gset_nms = c("go_sets", "c2_msig"), 
)

# GSPA visualization under volcano plots 
# forced lines of `pval_cutoff` and `logFC_cutoff`  
#   from the corresponding `prnGSPA`;
# more examples of aesthetics in `?gspaMap`
gspaMap(
  impute_na = FALSE,
  topn_gsets = 50, 
  show_sig = pVal, 
  xco = 1.2, # optional x-axis cut-off lines
  yco = 0.05, # a optional y-axis cut-off line 
)

## row filtration
prnGSPA(
  impute_na = FALSE,
  gset_nms = c("go_sets", "c2_msig"),
  filter_prots_by_npep = exprs(prot_n_pep >= 2),
  filter_prots_by_species = exprs(species == "human"), 
  filename = row_filters.txt,
)

# by default, volcano-plot visualization for all GSPA result files, 
#   which are "Protein_GSPA_Z.txt" and "row_filters_Protein_GSPA_Z.txt" 
#   up to this point
# will match the primary `fliter_` varargs to 
#   those in the corresponding `prnGSPA(...)`
gspaMap(
  impute_na = FALSE,
  topn_labels = 0, 
  topn_gsets = 50, 
  show_sig = pVal, 
)

# or manually supply specific secondary file(s) with `df2`; 
gspaMap(
  df2 = "row_filters_Protein_GSPA_Z.txt", 
  impute_na = FALSE,
  topn_gsets = 10, 
  show_sig = pVal, 
)

# may apply additionally secondary `fliter2_` varargs against `df2`
gspaMap(
  df2 = "row_filters_Protein_GSPA_Z.txt", 
  filter2_ess_size = exprs(ess_size >= 1),   
  impute_na = FALSE,
  topn_gsets = 10, 
  topn_labels = 0, 
  show_sig = pVal, 
)

# can also visualize results for specific formula(e) and/or 
#   specific `df2`
gspaMap(
  fml_nms = "W16_vs_W2",
  df2 = "row_filters_Protein_GSPA_Z.txt", 
  filter2_ess_size = exprs(ess_size >= 1),   
  impute_na = FALSE,
  topn_gsets = 20, 
  show_sig = pVal, 
)

## customized thresholds
# (formula names defined a priori in `pepSig(...)`)
# (higher thresholds for "W16_vs_W2"; also set `max_size` to 
#  avoid trapping with large terms, such as "cell parts"...)
prnGSPA(
  impute_na = FALSE, 
  fml_nms = c("W2_bat", "W2_loc", "W16_vs_W2"),
  pval_cutoff = c(5E-2, 5E-2, 1E-6), 
  logFC_cutoff = c(log2(1.2)), 
  gslogFC_cutoff = c(log2(1.2)), 
  gspval_cutoff = c(5E-2, 5E-2, 1E-4), 
  max_size = c(Inf, Inf, 120), 
  gset_nms = c("go_sets"), 
  filter_by_npep = exprs(prot_n_pep >= 2), 
  filter_prots_by_species = exprs(species == "human"), 
  filename = thresholds.txt,
)

gspaMap(
  df2 = "thresholds_Protein_GSPA_Z.txt", 
  impute_na = FALSE,
  topn_gsets = 10, 
  topn_labels = 0, 
  show_sig = pVal, 
)

# "limma" for "W16_vs_W2"
prnGSPA(
  impute_na = FALSE, 
  fml_nms = c("W2_bat", "W2_loc", "W16_vs_W2"),
  method = c("mean", "mean", "limma"),
  max_size = c(Inf, Inf, 120), 
  pval_cutoff = c(5E-2), 
  logFC_cutoff = c(log2(1.2)), 
  gspval_cutoff = c(5E-2), 
  gset_nms = c("go_sets", "c2_msig"), 
  filter_by = exprs(prot_n_pep >= 2, species == "human"), 
  filename = methods.txt,
)

gspaMap(
  df2 = "methods_Protein_GSPA_Z.txt", 
  impute_na = FALSE,
  topn_gsets = 25, 
  topn_labels = 0, 
  show_sig = pVal, 
)

## NA imputation
# if not yet, run prerequisitive NA imputation
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)

# if not yet, run prerequisitive significance tests at `impute_na = TRUE`
pepSig(
  impute_na = TRUE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = TRUE)

prnGSPA(
  impute_na = TRUE,
  gset_nms = c("go_sets"),
  filter_prots_by_npep = exprs(prot_n_pep >= 2),
)

# ===================================
# Distance heat maps of gene sets
# (also interactive networks)
# ===================================
# a `term` is a subset of an `ess_term` if the distance is zero
prnGSPAHM(
  filter2_by = exprs(distance <= .7),
  annot_cols = "ess_idx",
  annot_colnames = "Eset index",
  annot_rows = "ess_size",
  filename = show_some_redundancy.png,
)

# human terms only
prnGSPAHM(
  filter2_num = exprs(distance <= .95),
  filter2_sp = exprs(start_with_str("hs", term)),
  annot_cols = "ess_idx",
  annot_colnames = "Eset index",
  filename = show_more_connectivity.png,
)

# custom color palette
prnGSPAHM(
  annot_cols = c("ess_idx", "ess_size"),
  annot_colnames = c("Eset index", "Size"),
  filter2_by = exprs(distance <= .95),
  color = colorRampPalette(c("blue", "white", "red"))(100),
  filename = custom_colors.png,
)

# can also visualize results for specific `df2` and formula(e)
prnGSPAHM(
  df2 = "thresholds_Protein_GSPA_Z_essmap.txt", 
  fml_nms = "W16_vs_W2", 
  annot_cols = c("ess_idx", "ess_size"),
  annot_colnames = c("Eset index", "Size"),
  filter2_by = exprs(distance <= .95),
  filename = w16_vs_w2.png,
)


# ===================================
# custom GO databases
# ===================================
# start over
unlink(file.path(dat_dir, "Protein/GSPA"), recursive = TRUE, force = TRUE)

# prepare GO databases (see also ?prepGO)
prepGO(
  species = human,
  db_path = "~/proteoQ/dbs/go",
  gaf_url = "http://current.geneontology.org/annotations/goa_human.gaf.gz",
  obo_url = "http://purl.obolibrary.org/obo/go/go-basic.obo",
  filename = go_hs.rds,
)

prepGO(
  species = mouse,
  db_path = "~/proteoQ/dbs/go",
  gaf_url = "http://current.geneontology.org/annotations/mgi.gaf.gz",
  obo_url = "http://purl.obolibrary.org/obo/go/go-basic.obo",
  filename = go_mm.rds,
)

head(readRDS(file.path("~/proteoQ/dbs/go", "go_hs.rds")))
head(readRDS(file.path("~/proteoQ/dbs/go", "go_mm.rds")))

# analysis
prnGSPA(
  impute_na = FALSE,
  gset_nms = c("~/proteoQ/dbs/go/go_hs.rds",
               "~/proteoQ/dbs/go/go_mm.rds"),
)

# It is possible that the system "go_sets" may be partially different to the
# custom GO. Be explicit on `gset_nms`; otherwise, the default of `gset_nms =
# c("go_sets", "c2_msig")` will be applied.
gspaMap(
  gset_nms = c("~/proteoQ/dbs/go/go_hs.rds",
               "~/proteoQ/dbs/go/go_mm.rds"),
  impute_na = FALSE,
  topn_gsets = 20, 
  show_sig = pVal, 
)

# ===================================
# custom MSig databases
# ===================================
# start over
unlink(file.path(dat_dir, "Protein/GSPA"), recursive = TRUE, force = TRUE)

# prepare MSig databases (see also ?prepMSig)
prepMSig(species = human)
prepMSig(species = mouse)

head(readRDS(file.path("~/proteoQ/dbs/msig", "msig_hs.rds")))
head(readRDS(file.path("~/proteoQ/dbs/msig", "msig_mm.rds")))

# analysis
prnGSPA(
  impute_na = FALSE,
  gset_nms = c("~/proteoQ/dbs/msig/msig_hs.rds",
               "~/proteoQ/dbs/msig/msig_mm.rds"),
)

# It is possible that the system "c2_msig" may be partially different to the
# custom databases. Be explicit on `gset_nms`; otherwise, the default of
# `gset_nms = c("go_sets", "c2_msig")` will be applied.
gspaMap(
  gset_nms = c("~/proteoQ/dbs/msig/msig_hs.rds",
               "~/proteoQ/dbs/msig/msig_mm.rds"),
  impute_na = FALSE,
  topn_gsets = 20, 
  show_sig = pVal, 
)

## put custom `prepGO`, `prepMSig` preparation into `prnGSPA` analysis
# start over with fresh database preparation
unlink(file.path("~/proteoQ/dbs/go"), recursive = TRUE, force = TRUE)
unlink(file.path("~/proteoQ/dbs/msig"), recursive = TRUE, force = TRUE)

library(magrittr)

prnGSPA(
  impute_na = FALSE,
  pval_cutoff = 5E-2,
  logFC_cutoff = log2(1.2),
  gspval_cutoff = 5E-2,
  gset_nms = c(prepGO(human), prepGO(mouse), prepMSig(human), prepMSig(mouse)),
  filename = cmbn.txt,
) %>% 
  gspaMap(
    # gset_nms = ., 
    df2 = "cmbn_Protein_GSPA_Z.txt", 
    impute_na = FALSE,
    topn_gsets = 20, 
    topn_labels = 0, 
    show_sig = pVal, 
  )

qzhang503/proteoQ documentation built on April 13, 2025, 8:31 a.m.

qzhang503/proteoQ index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

prnGSPA: GSPA of peptide data
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

GSPA of peptide data

Description

Usage

Arguments

Details

See Also

Examples

Related to prnGSPA in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ Processing and Informatic Analysis of Mass Spectrometrirc Data

prnGSPA: GSPA of peptide data In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

GSPA of peptide data

Description

Usage

Arguments

Details

See Also

Examples

Related to prnGSPA in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

prnGSPA: GSPA of peptide data
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data