plot_volcano | R Documentation |
If there are multiple contrasts in your DEA results you should either use the 'wrapper function' plot_volcano_allcontrast(), OR, first subset the DEA result table for 1 contrast before calling this plot function (i.e. don't plot multiple contrasts into 1 volcano plot).
plot_volcano(
stats_de,
log2foldchange_threshold = NA,
qvalue_threshold = NA,
mtitle = "",
label_mode = "topn_pvalue",
label_target = 25,
label_avoid_overlap = TRUE,
show_plots = FALSE
)
stats_de |
typically these are the output generated by the MS-DAP dea() function. If you use the entire pipeline, i.e. analysis_quickstart(), the resulting dataset objects holds these in the dataset$de_proteins property. Include a column "gene_symbols_or_id" to plot gene symbols instead of protein_id, see examples below |
log2foldchange_threshold |
threshold for significance of log2 foldchanges. Set to NA to disregard (default) or provide a single numeric value (cutoff will be applied symetrically for both up- and down-regulated) |
qvalue_threshold |
Q-value threshold for significant hits. Set to NA to disregard (default), otherwise it is assumed the input data table contains a boolean column 'signif' |
mtitle |
optionally, a title for the plot |
label_mode |
which class of proteins should be labeled in the plot. Options: topn_pvalue (top N smallest p-value, default), signif (all significant proteins), protein_id (a provided set of protein_id). Defaults to topN for consistency and clarity; if the upstream analysis yielded hundreds of hits the labels will be unreadable |
label_target |
further specification of the label_mode parameter. For instance, if 'topn_pvalue' is set, here you can set the number of proteins that should be labeled. Analogously, if label_mode='protein_id' is set you can here provide an array of protein_id values (that are available in the stats_de data table) |
label_avoid_overlap |
use the ggrepel R package to try and place labels with minimal overlap (only works when the number of labeled proteins is relatively low and sparse, e.g. for topN 25). Options: TRUE, FALSE |
show_plots |
boolean parameter; should each plot be printed/shown immediately? If |
returns a named list that contains a list, with properties 'ggplot' and 'ggplot_data', for each unique 'dea_algorithm' in the input stats_de table
### Exampes. Note that these assume that prior, the MS-DAP pipeline was successfully run
# using `dataset = analysis_quickstart(...)`.
# If your dataset contains multiple contrasts, follow example 5
## example 1: add protein-metadata to the DEA results (dataset$de_proteins), plotting the
# top 10 'best pvalue' hits while hardcoding the cutoffs for foldchange and Q-value
## Not run:
plot_list = msdap::plot_volcano(dataset$de_proteins %>% left_join(dataset$proteins),
log2foldchange_threshold = 1, qvalue_threshold = 0.01,
mtitle = "volcano, label top 10", label_mode = "topn_pvalue", label_target = 10,
label_avoid_overlap = TRUE, show_plots = TRUE
)
## End(Not run)
## example 2: analogous, but now show all significant proteins and disable "repelled labels"
# (instead, print protein labels just below each data point)
## Not run:
plot_list = msdap::plot_volcano(dataset$de_proteins %>% left_join(dataset$proteins),
log2foldchange_threshold = 1, qvalue_threshold = 0.01,
mtitle = "volcano, label all significant", label_mode = "signif",
label_avoid_overlap = FALSE, show_plots = TRUE
)
## End(Not run)
## example 3: show labels for some set of protein IDs. First line selects all proteins where symbol
# starts with GRIA or DLG (arbitrary example, either adapt the regex or use other filters/criteria
# to define a subset of protein_id from your dataset). Second line shows how to specify protein_id
# to be used as a label
## Not run:
pid_label = dataset$proteins %>%
filter(grepl("^(GRIA|DLG)", gene_symbols_or_id, ignore.case=T)) %>% pull(protein_id)
plot_list = msdap::plot_volcano(dataset$de_proteins %>% left_join(dataset$proteins),
log2foldchange_threshold = 1, qvalue_threshold = 0.01,
mtitle = "volcano, label selected proteins", label_mode = "protein_id",
label_target = pid_label, label_avoid_overlap = FALSE, show_plots = TRUE
)
## End(Not run)
## example 4: plot all significant labels as before, then add custom labels for some subset of
# proteins (we here regex select some labels in the plot, you should adapt the regex to match
# some proteins in your dataset to make this work, but you can also further filter by other
# properties in the 'ggplot_data' tibble like the y-coordinate aka qvalue)
## Not run:
plot_list = msdap::plot_volcano(dataset$de_proteins %>% left_join(dataset$proteins),
log2foldchange_threshold = 1, qvalue_threshold = 0.01, mtitle = "volcano,
label all significant + custom labels", label_mode = "signif",
label_avoid_overlap = FALSE, show_plots = FALSE
)
l = plot_list[[1]]
l$ggplot + ggrepel::geom_text_repel(
alpha=1, color="green", data = l$ggplot_data %>%
filter(plottype %in% c("asis_lab", "lim_lab") & grepl("^(GRIA|DLG)", label, ignore.case=T)),
segment.alpha = 0.3, min.segment.length = unit(0.25, 'lines'),
vjust = 0.6, show.legend = FALSE, size = 2
)
## End(Not run)
# example 5: iterate over contrasts before calling plot_volcano().
# this is essentially the same as using helper function plot_volcano_allcontrast()
## Not run:
contrasts = unique(dataset$de_proteins$contrast)
for(contr in contrasts) {
# subset the DEA results for the current contrast
tib_volcano = dataset$de_proteins %>% filter(contrast==contr) %>% left_join(dataset$proteins)
# volcano plot function (compared to above example, now include the contrast in the title)
plot_list = msdap::plot_volcano(tib_volcano, log2foldchange_threshold = 1,
qvalue_threshold = 0.01, mtitle = paste(contr, "volcano, label top 10"),
label_mode = "topn_pvalue", label_target = 10, label_avoid_overlap = TRUE,
show_plots = TRUE
)
}
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.