plot_volcano: volcano plot for DEA results, split into 4 panels;...
In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline

plot_volcano

R Documentation

volcano plot for DEA results, split into 4 panels; with/without labels and with/without thresholding foldchange-outliers

Description

If there are multiple contrasts in your DEA results you should either use the 'wrapper function' plot_volcano_allcontrast(), OR, first subset the DEA result table for 1 contrast before calling this plot function (i.e. don't plot multiple contrasts into 1 volcano plot).

Usage

plot_volcano(
  stats_de,
  log2foldchange_threshold = NA,
  qvalue_threshold = NA,
  mtitle = "",
  label_mode = "topn_pvalue",
  label_target = 25,
  label_avoid_overlap = TRUE,
  show_plots = FALSE
)

Arguments

`stats_de`	typically these are the output generated by the MS-DAP dea() function. If you use the entire pipeline, i.e. analysis_quickstart(), the resulting dataset objects holds these in the dataset$de_proteins property. Include a column "gene_symbols_or_id" to plot gene symbols instead of protein_id, see examples below
`log2foldchange_threshold`	threshold for significance of log2 foldchanges. Set to NA to disregard (default) or provide a single numeric value (cutoff will be applied symetrically for both up- and down-regulated)
`qvalue_threshold`	Q-value threshold for significant hits. Set to NA to disregard (default), otherwise it is assumed the input data table contains a boolean column 'signif'
`mtitle`	optionally, a title for the plot
`label_mode`	which class of proteins should be labeled in the plot. Options: topn_pvalue (top N smallest p-value, default), signif (all significant proteins), protein_id (a provided set of protein_id). Defaults to topN for consistency and clarity; if the upstream analysis yielded hundreds of hits the labels will be unreadable
`label_target`	further specification of the label_mode parameter. For instance, if 'topn_pvalue' is set, here you can set the number of proteins that should be labeled. Analogously, if label_mode='protein_id' is set you can here provide an array of protein_id values (that are available in the stats_de data table)
`label_avoid_overlap`	use the ggrepel R package to try and place labels with minimal overlap (only works when the number of labeled proteins is relatively low and sparse, e.g. for topN 25). Options: TRUE, FALSE
`show_plots`	boolean parameter; should each plot be printed/shown immediately? If `FALSE` (default) you'll have to extract the ggplot objects from the resulting list in order to print the plots

Value

returns a named list that contains a list, with properties 'ggplot' and 'ggplot_data', for each unique 'dea_algorithm' in the input stats_de table

Examples

### Exampes. Note that these assume that prior, the MS-DAP pipeline was successfully run
# using `dataset = analysis_quickstart(...)`.
# If your dataset contains multiple contrasts, follow example 5

## example 1: add protein-metadata to the DEA results (dataset$de_proteins), plotting the
# top 10 'best pvalue' hits while hardcoding the cutoffs for foldchange and Q-value
## Not run: 
  plot_list = msdap::plot_volcano(dataset$de_proteins %>% left_join(dataset$proteins),
    log2foldchange_threshold = 1, qvalue_threshold = 0.01,
    mtitle = "volcano, label top 10", label_mode = "topn_pvalue", label_target = 10,
    label_avoid_overlap = TRUE, show_plots = TRUE
  )

## End(Not run)

## example 2: analogous, but now show all significant proteins and disable "repelled labels"
# (instead, print protein labels just below each data point)
## Not run: 
  plot_list = msdap::plot_volcano(dataset$de_proteins %>% left_join(dataset$proteins),
    log2foldchange_threshold = 1, qvalue_threshold = 0.01,
    mtitle = "volcano, label all significant", label_mode = "signif",
    label_avoid_overlap = FALSE, show_plots = TRUE
  )

## End(Not run)

## example 3: show labels for some set of protein IDs. First line selects all proteins where symbol
# starts with GRIA or DLG (arbitrary example, either adapt the regex or use other filters/criteria
# to define a subset of protein_id from your dataset). Second line shows how to specify protein_id
# to be used as a label
## Not run: 
  pid_label = dataset$proteins %>%
    filter(grepl("^(GRIA|DLG)", gene_symbols_or_id, ignore.case=T)) %>% pull(protein_id)
  plot_list = msdap::plot_volcano(dataset$de_proteins %>% left_join(dataset$proteins),
    log2foldchange_threshold = 1, qvalue_threshold = 0.01,
    mtitle = "volcano, label selected proteins", label_mode = "protein_id",
    label_target = pid_label, label_avoid_overlap = FALSE, show_plots = TRUE
  )

## End(Not run)

## example 4: plot all significant labels as before, then add custom labels for some subset of
# proteins (we here regex select some labels in the plot, you should adapt the regex to match
# some proteins in your dataset to make this work, but you can also further filter by other
# properties in the 'ggplot_data' tibble like the y-coordinate aka qvalue)
## Not run: 
  plot_list = msdap::plot_volcano(dataset$de_proteins %>% left_join(dataset$proteins),
    log2foldchange_threshold = 1, qvalue_threshold = 0.01, mtitle = "volcano,
    label all significant + custom labels", label_mode = "signif",
    label_avoid_overlap = FALSE, show_plots = FALSE
  )
  l = plot_list[[1]]
  l$ggplot + ggrepel::geom_text_repel(
    alpha=1, color="green", data = l$ggplot_data %>%
      filter(plottype %in% c("asis_lab", "lim_lab") & grepl("^(GRIA|DLG)", label, ignore.case=T)),
    segment.alpha = 0.3, min.segment.length = unit(0.25, 'lines'),
    vjust = 0.6, show.legend = FALSE, size = 2
  )

## End(Not run)

# example 5: iterate over contrasts before calling plot_volcano().
# this is essentially the same as using helper function plot_volcano_allcontrast()
## Not run: 
  contrasts = unique(dataset$de_proteins$contrast)
  for(contr in contrasts) {
    # subset the DEA results for the current contrast
    tib_volcano = dataset$de_proteins %>% filter(contrast==contr) %>% left_join(dataset$proteins)

    # volcano plot function (compared to above example, now include the contrast in the title)
    plot_list = msdap::plot_volcano(tib_volcano, log2foldchange_threshold = 1,
      qvalue_threshold = 0.01, mtitle = paste(contr, "volcano, label top 10"),
      label_mode = "topn_pvalue", label_target = 10, label_avoid_overlap = TRUE,
      show_plots = TRUE
    )
  }

## End(Not run)

ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.