gspaMap: Volcano plots of protein 'log2FC' under gene sets
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

gspaMap

R Documentation

Volcano plots of protein `log2FC` under gene sets

Description

gspaMap visualizes the volcano plots of protein subgroups under the same gene sets.

Usage

gspaMap(
  gset_nms = c("go_sets", "c2_msig", "kinsub"),
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  df = NULL,
  df2 = NULL,
  filepath = NULL,
  filename = NULL,
  fml_nms = NULL,
  adjP = FALSE,
  topn_labels = 20,
  show_sig = "none",
  gspval_cutoff = 0.05,
  gslogFC_cutoff = log2(1.2),
  topn_gsets = Inf,
  theme = NULL,
  ...
)

Arguments

`gset_nms`	Character string or vector containing the shorthanded name(s), full file path(s), or both, to gene sets for enrichment analysis. For species among `"human", "mouse", "rat"`, the default of `c("go_sets", "c2_msig", "kinsub")` will utilize terms from gene ontology (`GO`), molecular signatures (`MSig`) and kinase-substrate network (`PSP Kinase-Substrate`). Custom `GO`, `MSig` and other data bases at given species are also supported. See also: `prepGO` for the preparation of custom `GO`; `prepMSig` for the preparation of custom `MSig`. For other custom data bases, follow the same format of list as `GO` or `MSig`.
`scale_log2r`	Logical; if TRUE, adjusts `log2FC` to the same scale of standard deviation across all samples. The default is TRUE. At `scale_log2r = NA`, the raw `log2FC` without normalization will be used.
`complete_cases`	Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.
`impute_na`	Logical; if TRUE, data with the imputation of missing values will be used. The default is FALSE.
`df`	The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an `id` among `c("pep_seq", "pep_seq_mod", "prot_acc", "gene")`. A primary file contains normalized peptide or protein data and is among `c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`. For analyses require the fields of significance p-values, the `df` will be one of `c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`.
`df2`	Character vector or string; the name(s) of secondary data file(s). An informatic task, i.e. `anal_prnTrend(...)` against a primary `df` generates secondary files such as `Protein_Trend_Z_nclust6.txt` etc. See also `prnHist` for the description of a primary `df`; `normPSM` for the lists of `df` and `df2`.
`filepath`	Use system default.
`filename`	Use system default for each gene set.
`fml_nms`	Character string or vector; the formula name(s). By default, the formula(s) will match those used in `pepSig` or `prnSig`.
`adjP`	Logical; if TRUE, use Benjamini-Hochberg pVals in volcano plots. The default is FALSE.
`topn_labels`	A non-negative integer; the top-n species for labeling in a plot. At `topn_labels = 0`, no labels of proteins/peptides will be shown. The default is to label the top-20 species with the lowest p-values.
`show_sig`	Character string indicating the type of significance values to be shown with `gspaMap`. The default is `"none"`. Additional choices are from `c("pVal", "qVal")` where `pVal` or `qVal` will be shown, respectively, in the facet grid of the plots.
`gspval_cutoff`	Numeric value or vector for uses with `gspaMap`. `Gene sets` with enrichment `pVals` less significant than the threshold will be excluded from volcano plot visualization. The default significance is 0.05 for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of cut-off values.
`gslogFC_cutoff`	Numeric value or vector for uses with `gspaMap`. `Gene sets` with absolute enrichment `log2FC` less than the threshold will be excluded from volcano plot visualization. The default magnitude is `log2(1.2)` for all formulas matched to or specified in argument `fml_nms`. Formula-specific threshold is allowed by supplying a vector of absolute values in `log2FC`.
`topn_gsets`	Numeric value or vector; top entries in gene sets ordered by increasing `pVal` for visualization. The default is to use all available entries. Note that it is users' responsibility to ensure that the custom gene sets contain terms that can be found from the one or multiple preceding analyses of `prnGSPA`. For simplicity, it is generally applicable to include all of the data bases that have been applied to `prnGSPA` and in that way no terms will be missed for visualization. See also `prnGSPA` for examples of custom data bases.
`theme`	A ggplot2 theme, i.e., theme_bw(), or a custom theme. At the NULL default, a system theme will be applied.
`...`	`filter_`: Variable argument statements for the row filtration against data in a primary file linked to `df`. See also `normPSM` for the format of `filter_` statements and the association between `filter_` and `df`. `filter2_`: Variable argument statements for the row filtration against data in secondary file(s) linked to `df2`. See also `prnGSPAHM` for the format of `filter2_`, `normPSM` for the association between `filter_` and `df`. Additional parameters for plotting: `xco`, the cut-off lines of fold changes at position `x`; the default is at `-1.2` and `+1.2`. `yco`, the cut-off line of `pVal` at position `y`; the default is `0.05`. `width`, the width of plot; `height`, the height of plot. `nrow`, the number of rows in a plot.

Examples


# ===================================
# Volcano plots
# ===================================

## !!!require the brief working example in `?load_expts`

## global option
scale_log2r <- TRUE

## for all peptides or proteins
# all peptides
pepVol()

# all proteins
prnVol(
  xco = 1.2,
  yco = 0.01,
)

# hide `xco` and/or `yco` lines
# (xco = 0 -> log2(xco) = - Inf)
prnVol(
  xco = 0,
  yco = Inf,
  filename = no_xylines.png,
)

# shows vertical center line at log2(1)
# (xco = 1 -> log2(xco) = 0)
prnVol(
  xco = 1,
  yco = Inf,
  filename = no_xylines.png,
)

# kinases and prot_n_pep >= 2
prnVol(
  xco = 1.2,
  yco = 0.01,
  filter_prots_by_kin = exprs(kin_attr, prot_n_pep >= 2),
  filename = "kin_npep2.png"
)

# selected formula and/or customization
prnVol(
  fml_nms = "W2_bat",
  xmin = -5,
  xmax = 5, 
  ymin = 0, 
  ymax = 30,
  x_label = "Ratio ("*log[2]*")",
  y_label = "pVal ("*-log[10]*")", 
  height = 6,
  width = 6*2.7,
  filename = custom.png,
)

# custom theme
library(ggplot2)
my_theme <- theme_bw() +
  theme(
    axis.text.x = element_text(angle = 0, vjust = 0.5, size = 24),
    axis.ticks.x = element_blank(),
    axis.text.y = element_text(angle = 0, vjust = 0.5, size = 24),
    axis.title.x = element_text(colour = "black", size = 24),
    axis.title.y = element_text(colour="black", size = 24),
    plot.title = element_text(face = "bold", colour = "black", size = 14, 
                              hjust = .5, vjust = .5),
    
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
    
    strip.text.x = element_text(size = 16, colour = "black", angle = 0),
    strip.text.y = element_text(size = 16, colour = "black", angle = 90),
    
    legend.key = element_rect(colour = NA, fill = 'transparent'),
    legend.background = element_rect(colour = NA,  fill = "transparent"),
    legend.position = "none",
    legend.title = element_text(colour="black", size = 18),
    legend.text = element_text(colour="black", size = 18),
    legend.text.align = 0,
    legend.box = NULL
  )

prnVol(theme = my_theme, filename = my_theme.png)

# custom plot
# ("W2_bat" etc. are contrast names in `pepSig`)
prnVol(fml_nms = c("W2_bat", "W2_loc"), filename = foo.png)

res <- readRDS(file.path(dat_dir, "Protein/Volcano/W2_bat/foo.rds"))
# names(res)

p <- ggplot() +
  geom_point(data = res$data, mapping = aes(x = log2Ratio, y = -log10(pVal)), 
             size = 3, colour = "#f0f0f0", shape = 20, alpha = .5) +
  geom_point(data = res$greater, mapping = aes(x = log2Ratio, y = -log10(pVal)), 
             size = 3, color = res$palette[2], shape = 20, alpha = .8) +
  geom_point(data = res$less, mapping = aes(x = log2Ratio, y = -log10(pVal)), 
             size = 3, color = res$palette[1], shape = 20, alpha = .8) +
  geom_hline(yintercept = -log10(res$yco), linetype = "longdash", size = .5) +
  geom_vline(xintercept = -log2(res$xco), linetype = "longdash", size = .5) +
  geom_vline(xintercept = log2(res$xco), linetype = "longdash", size = .5) +
  scale_x_continuous(limits = c(res$xmin, res$xmax)) +
  scale_y_continuous(limits = c(res$ymin, res$ymax)) +
  labs(title = res$title, x = res$x_label, y = res$y_label) +
  res$theme

p <- p + geom_text(data = res$topns, 
                   mapping = aes(x = log2Ratio, 
                                 y = -log10(pVal), 
                                 label = Index, 
                                 color = Index),
                   size = 3, 
                   alpha = .5, 
                   hjust = 0, 
                   nudge_x = 0.05, 
                   vjust = 0, 
                   nudge_y = 0.05, 
                   na.rm = TRUE)

p <- p + facet_wrap(~ Contrast, nrow = 1, labeller = label_value)

p <- p + geom_table(data = res$topn_labels, aes(table = gene), 
                    x = -res$xmax*.85, y = res$ymax/2)

# Highlight
prnVol(
  highlights = rlang::exprs(gene %in% c("ACTB", "GAPDH")), 
  filename = highlights.png
)


## protein subgroups by gene sets
# prerequisite analysis of GSPA
prnGSPA(
  impute_na = FALSE,
  pval_cutoff = 1E-2, # protein pVal threshold
  logFC_cutoff = log2(1.1), # protein log2FC threshold
  gspval_cutoff = 5E-2, # gene-set pVal threshold
  gslogFC_cutoff = log2(1.2), # gene-set log2FC threshold
  gset_nms = c("go_sets"),
)

# mapping gene sets to volcano-plot visualization
# (1) forced lines of `pval_cutoff` and `logFC_cutoff`  
#   according to the corresponding `prnGSPA` in red; 
# (2) optional lines of `xco` and `yco` in grey
gspaMap(
  impute_na = FALSE,
  topn_gsets = 20, 
  show_sig = pVal, 
)

# disable the lines of `xco` and `yco`, 
gspaMap(
  impute_na = FALSE,
  topn_gsets = 20, 
  show_sig = pVal, 
  xco = 0, 
  yco = Inf, 
)

# customized thresholds for visualization
gspaMap(
  fml_nms = c("W2_bat", "W2_loc", "W16_vs_W2"),
  gspval_cutoff = c(5E-2, 5E-2, 1E-10),
  gslogFC_cutoff = log2(1.2),
  topn_gsets = 20, 
  topn_labels = 0,
  show_sig = pVal,
  xco = 0, 
  yco = Inf, 
)

## gspaMap(...) maps secondary results of `[...]Protein_GSPA_{NZ}[_impNA].txt` 
#  from prnGSPA(...) onto a primary `df` of `Protein[_impNA]_pVal.txt` 
#  
#  see also ?prnGSPA for additional examples involving both `df` and `df2`, 
#  as well as `filter_` and `filter2_`

qzhang503/proteoQ documentation built on April 13, 2025, 8:31 a.m.

qzhang503/proteoQ index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

gspaMap: Volcano plots of protein 'log2FC' under gene sets
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

Volcano plots of protein `log2FC` under gene sets

Description

Usage

Arguments

See Also

Examples

Related to gspaMap in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ Processing and Informatic Analysis of Mass Spectrometrirc Data

gspaMap: Volcano plots of protein 'log2FC' under gene sets In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

Volcano plots of protein log2FC under gene sets

Description

Usage

Arguments

See Also

Examples

Related to gspaMap in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

gspaMap: Volcano plots of protein 'log2FC' under gene sets
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

Volcano plots of protein `log2FC` under gene sets