Compiled date: r Sys.Date()

Last edited: 2022-04-04

License: r packageDescription("GeneTonic")[["License"]]

knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  error    = FALSE,
  warning  = FALSE,
  eval     = TRUE,
  message  = FALSE,
  fig.width = 10
)
options(width = 100)
stopifnot(requireNamespace("htmltools"))
htmltools::tagList(rmarkdown::html_dependency_font_awesome())

Introduction {#introduction}

This vignette describes how to use the r BiocStyle::Biocpkg("GeneTonic") package for analyzing and integrating the results from Differential Expression analysis and functional enrichment analysis.

This package provides a Shiny application that aims to combine at different levels the existing pieces of the transcriptome data and results, in a way that makes it easier to generate insightful observations and hypothesis - combining the benefits of interactivity and reproducibility, e.g. by capturing the features and gene sets of interest highlighted during the live session, and creating an HTML report as an artifact where text, code, and output coexist.

In order to use r BiocStyle::Biocpkg("GeneTonic") in your workflow, the following inputs are required:

This workflow has mainly been tested with expression matrices with Ensembl identifiers, which are consistent and unambiguous across different releases, and results from functional enrichment analysis originating from ORA (OverRepresentation Analysis) methods, on the Gene Ontology signature database. It may need to be slightly adjusted to work with other formats and sources, but the core functionality will remain available.

In the remainder of this vignette, we will illustrate the main features of r BiocStyle::Biocpkg("GeneTonic") on a publicly available dataset from Alasoo, et al. "Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response", published in Nature Genetics, January 2018 [@Alasoo2018] doi:10.1038/s41588-018-0046-7.

The data is made available via the r BiocStyle::Biocpkg("macrophage") Bioconductor package, which contains the files output from the Salmon quantification (version 0.12.0, with Gencode v29 reference), as well as the values summarized at the gene level, which we will use to exemplify.

In the macrophage experimental setting, the samples are available from 6 different donors, in 4 different conditions (naive, treated with Interferon gamma, with SL1344, or with a combination of Interferon gamma and SL1344). We will restrict our attention on the comparison between Interferon gamma treated samples vs naive samples.

Getting started {#gettingstarted}

To install this package, start R and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("GeneTonic")

Once installed, the package can be loaded and attached to your current workspace as follows:

library("GeneTonic")

If you have all four input elements ready, you can launch the GeneTonic() app by running:

GeneTonic(dds = dds_object,
          res_de = res_de_object,
          res_enrich = res_enrich_object,
          annotation_obj = annotation_object,
          project_id = "myFirstGeneTonic")

In this vignette, we showcase the functionality of r BiocStyle::Biocpkg("GeneTonic") using the gene-level summarized version of the macrophage dataset. If you want to dive in and start playing with the app immediately, you can simply run:

example("GeneTonic", ask = FALSE)

Otherwise, you can follow the next chunks of code to generate the required input objects, step by step. In case some formatting requirements are expected, these will be specified in the text/comments, and they will be checked internally upon launching the GeneTonic app.

For running GeneTonic, you will need the four components mentioned above as parameters.

  1. dds: First you will need a DESeqDataSet, which is the main component in the DESeq2 framework and extends the widely adopted SummarizedExperiment class. This object will store the information related to the expression matrix.
library("macrophage")
library("DESeq2")

data("gse", package = "macrophage")

dds_macrophage <- DESeqDataSet(gse, design = ~line + condition)
# changing the ids to Ensembl instead of the Gencode used in the object
rownames(dds_macrophage) <- substr(rownames(dds_macrophage), 1, 15)
dds_macrophage
  1. res_de: Next you are going to need the results of Differential Expression analysis, computed on the dds_macrophage object you just obtained.

The expected format is a DESeqResults object. Here we are going to contrast two different conditions, IFNg and naive, while controlling for the cell line of origin (which has 6 levels, namely r knitr::combine_words(levels(dds_macrophage$condition))).

We start by filtering lowly expressed features (at least 10 counts in at least 6 samples - 6 being the size of the smallest group). Then, we test against a null hypothesis of a log2FoldChange of 1 (instead of the default value of 0), in order to specify that we want to call DE genes with a consistent and robust expression change. Finally, we add the gene symbol to the output DataFrame.

keep <- rowSums(counts(dds_macrophage) >= 10) >= 6
dds_macrophage <- dds_macrophage[keep, ]
dds_macrophage
dds_macrophage <- DESeq(dds_macrophage)
# vst_macrophage <- vst(dds_macrophage)
res_macrophage_IFNg_vs_naive <- results(dds_macrophage,
                                        contrast = c("condition", "IFNg", "naive"),
                                        lfcThreshold = 1, alpha = 0.05)
res_macrophage_IFNg_vs_naive$SYMBOL <- rowData(dds_macrophage)$SYMBOL
## To speed up the operations in the vignette, we can also load this object directly
data("res_de_macrophage")
head(res_macrophage_IFNg_vs_naive)
  1. res_enrich: With the DE results of the previous step, we are going to extract the vector of DE genes (via mosdef::deresult_to_df), as well as the list of genes to be used as background, and feed these two objects into a function that computes the functional enrichment of the DE genes.

We are going to use the ORA method implemented in r BiocStyle::Biocpkg("topGO"), wrapped in the function run_topGO available in the r BiocStyle::Biocpkg("mosdef") package, which by default uses the BP ontology and the elim method to decorrelate the GO graph structure and deliver less redundant functional categories. You can also use as an alternative the enrichGO function from r BiocStyle::Biocpkg("clusterProfiler").

library("AnnotationDbi")
de_symbols_IFNg_vs_naive <- mosdef::deresult_to_df(res_macrophage_IFNg_vs_naive, FDR = 0.05)$SYMBOL
bg_ids <- rowData(dds_macrophage)$SYMBOL[rowSums(counts(dds_macrophage)) > 0]
library("topGO")
topgoDE_macrophage_IFNg_vs_naive <-
  mosdef::run_topGO(de_genes = de_symbols_IFNg_vs_naive,
                    bg_genes = bg_ids,
                    ontology = "BP",
                    mapping = "org.Hs.eg.db",
                    geneID = "symbol")
## To speed up the operations in the vignette, we also load this object directly
data("res_enrich_macrophage")
head(topgoDE_macrophage_IFNg_vs_naive, 2)

Since in this step the object can be created from different methods, but the GeneTonic main functions require some specific columns to be present, we have created some functions to convert the output of the upstream tools into a format compatible with GeneTonic. The functions shake_enrichResult() and shake_topGOtableResult() serve this purpose - if your favorite tool might be different, you can write your own conversion function and contribute it to GeneTonic.

res_enrich_macrophage <- shake_topGOtableResult(topgoDE_macrophage_IFNg_vs_naive)
colnames(res_enrich_macrophage)
  1. annotation_obj: Last object required is the annotation data frame, i.e. a simple data frame composed at least of two columns, gene_id, with a set of unambiguous identifiers (e.g. ENSEMBL ids) corresponding to the row names of the dds object, and gene_name, containing e.g. HGNC-based gene symbols. This object can be constructed via the org.eg.XX.db packages, e.g. with convenience functions such as pcaExplorer::get_annotation_orgdb(). Here we display the general way of obtaining this object.
library("org.Hs.eg.db")
anno_df <- data.frame(
  gene_id = rownames(dds_macrophage),
  gene_name = mapIds(org.Hs.eg.db, keys = rownames(dds_macrophage), column = "SYMBOL", keytype = "ENSEMBL"),
  stringsAsFactors = FALSE,
  row.names = rownames(dds_macrophage)
)
## alternatively:
# anno_df <- mosdef::get_annotation_orgdb(dds_macrophage, "org.Hs.eg.db", "ENSEMBL")

For some operations in the main app and in the single functions, sometimes it is required to have aggregated scores for the res_enrich data frame.

We can do so by calling get_aggrscores() on the combination of objects we just generated. This adds two columns to the provided res_enrich object, z_score and aggr_score, which summarize at the gene set level the effect (log2FoldChange) of the differentially expressed genes which are its members. In particular, the z score attempts to determine the "direction" of change, computed as $z = \frac{(upgenes - downgenes)}{\sqrt{(upgenes + downgenes)}}$, regardless of the effect size of the single members of the gene set.

res_enrich_macrophage <- get_aggrscores(res_enrich = res_enrich_macrophage,
                                        res_de = res_macrophage_IFNg_vs_naive,
                                        annotation_obj = anno_df,
                                        aggrfun = mean)

All set!

Once you have the four objects specified above, you can set the project_id to be a character string at wish, and you can just launch the GeneTonic() app:

GeneTonic(dds = dds_macrophage,
          res_de = res_macrophage_IFNg_vs_naive,
          res_enrich = res_enrich_macrophage,
          annotation_obj = anno_df,
          project_id = "GT1")

Alternatively (this applies from version >= 1.3.0), it is possible to provide a single list object to GeneTonic() and many of its functions.
Please note that the arguments should be named as specified in the example below.

gtl <- GeneTonicList(
  dds = DESeq2::estimateSizeFactors(dds_macrophage),
  res_de = res_macrophage_IFNg_vs_naive,
  res_enrich = res_enrich_macrophage,
  annotation_obj = anno_df
)

# if nothing is returned, the object is ready to be provided to GeneTonic
checkup_gtl(gtl)

if (interactive()) {
  GeneTonic(gtl = gtl)
}

From version 2.0.0 onwards, GeneTonic offers the possibility to upload the GeneTonicList objects at runtime - these need to be provided as serialized RDS objects (which can be generated when calling saveRDS(), or alternatively exported from within the GeneTonic app).
This means it is possible to use GeneTonic also as a server-like application for internal deployment. An exemplary app.R file to be used on a Shiny Server could be as simple as

max_gtl_size <- 250  ## the max size in Megabytes for the gtl object to be uploaded
library("GeneTonic")
GeneTonic(size_gtl = max_gtl_size)

To upload your GeneTonicList object, click on the "Browse" widget and select the specific RDS file from the explorer window that will open up.
If the upload was successful, GeneTonic will make all the panels available, ready to be interactively explored. If providing the data in a wrong or unexpected format, the app should be able to recognize this and notify the users in the lower right corner.

knitr::include_graphics("upload_gtl.png")

Description of the GeneTonic user interface {#userinterface}

The GeneTonic app is built with r BiocStyle::CRANpkg("shiny"), and its layout is built around the components of r BiocStyle::CRANpkg("bs4Dash"), a modern looking dashboard based on Bootstrap 4.

Header (navbar)

The dashboard navbar (as it is called in bs4Dash) contains two dropdown menus, which can be expanded by clicking on the respective icons ( and ). These provide additional buttons to do one the following:

Additionally, one fundamental button is present in the navbar - labeled "Bookmark", with the heart icon. This serves the purpose of bookmarking genes and gene sets of interest while exploring your data in an interactive session. This functionality will be exploited in the Bookmarks section of the app, where a complete HTML report can be generated as a reproducible artifact.

Sidebar

On the left side of the app, by clicking on the menu bar icon (or if the app is viewed in full screen, by simply moving the mouse over to the left side), the sidebar menu is triggered. This constitutes the main way to access the different tabs of GeneTonic(), which will be explained in more detail in the next section (\@ref(functionality)).

Controlbar

The controlbar is triggered by clicking on the cogs icon on the upper right part of the navbar. There you can find widgets that control the appearance of output (plots, tables, graphs) in multiple tabs of the main application, such as for example the number of gene sets to focus your attention on.

Body

The main body of the GeneTonic application is structured in tabs that are activated by clicking on the respective icons or text in the sidebar.

While the Welcome tab might be self-explanatory, the functionality of each tab can go in depth, and new users can benefit of the question circle button () button to trigger an interactive tour of GeneTonic, which allows users to learn the basic usage mechanisms by doing. During a tour, the highlighted elements respond to the user's actions, while the rest of the UI is kept shaded. Tours can be interrupted anytime by clicking outside of the focused window, and arrows (left, right) can be used as well to navigate between steps. The tour functionality is provided by the beautiful r BiocStyle::CRANpkg("rintrojs") package.

The GeneTonic functionality {#functionality}

The main GeneTonic app is structured in different panels, and each of these is presented in detail in the subsequent sections.

The Welcome panel

This panel in intended to give an overview on the different ingredients you just provided for launching GeneTonic.

Collapsible elements, colored each by the typology of data it refers to, contain the underlying tabular information, and a value box provides a quick summary on the dimensions of the input parameters.

The tables are rendered with DT::datatable, which provides a practical interface for exploring, sorting, and filtering. This framework is also extendable, for example as we do for the log2FoldChange column in the res_de object, where the values are also visually encoded as diverging bars of different color based on the expression change direction.

In the Welcome panel, the GeneSpector box will also enable users to explore a gene expression plot for any of the genes in the dds object, irrespective of them being members or not of any geneset.
To do so, select a gene from the dropdown menu (autocompletion is supported), and a covariate from the "Group/color by" selector in the control menu on the right (opened by clicking on the cogs icon).

If launched without providing input data (as individual components, or as GeneTonicList object), GeneTonic offers the possibility to upload the GeneTonicList objects at runtime.
The upload functionality is showing in this panel, and enables GeneTonic being used also as a server-like application for internal deployment.

The Gene-Geneset panel

This panel has as its main component an interactive bipartite graph illustrating the relationship between genesets and genes that belong to them. One of the relevant parameters for this output is the number of gene sets to be displayed, adjustable from the widget in the control panel on the right.

Such a representation is particularly useful to see overlap between terms, and grasp the way how different genes may play a concerting role in different processes.

The color and shape of the nodes depend on the entity they are related to:

The interactivity in this graph is essential for zooming, panning, and selecting its nodes. The node selection, in particular, triggers a drill-down exploration, which populates a box on the right side, either with a signature heatmap (for a gene set), or with an expression plot, split by the experimental covariates of interest (for a gene) - this behavior can be controlled by the "Group/color by" selectize widget in the control sidebar. In both cases, some HTML content is also generated, with buttons linking to external databases where to retrieve additional information (e.g. ENSEMBL, NCBI, AmiGO, ...).

A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.

Bookmarking from the Gene-Geneset panel: When clicking on the Bookmark button while in this tab, it is possible to add to the shortlisted items of interest both genes and gene sets. To do so, while one of the nodes is selected, click on the button with the mouse - or alternatively, click on the Left Ctrl button of your keyboard. A notification will be triggered, informing the status of the bookmarked elements.

The Enrichment Map panel

This panel, similar to the Gene-Geneset tab, has again an interactive graph of a subset of the gene sets (you can control again the number of included sets from the control sidebar). This is also called an enrichment map - it is also possible to generate them in tools such as Cytoscape. In this case, the genesets are directly connected by an edge, where the thickness encodes the degree of similarity (e.g. in terms of overlap) between them; sets below a similarity threshold are not displayed for the sake of better clarity.

The size of the node encodes the information of the number of genes detected as Differentially Expressed (from the column gs_de_count), while the color is representative of the computed Z score for each set. This value is computed as $z = \frac{(upgenes - downgenes)}{\sqrt{(upgenes + downgenes)}}$ and is indicative of the expression changes of the genes assigned to a particular set. Please, keep in mind that methods based on overrepresentation analysis only do not account for the topology that might be behind the set of interacting features (i.e. an upregulated gene involved in inhibition of a pathway would still be counted as a positive element in the equation above).

The z_score and aggr_score (in the simplest form, a mean of the log2FoldChange values for all the genes of a set) can be easily computed with the get_aggrscores(), which appends these scores for each gene set to the original res_enrich data frame.

The interactive flavor in this tab permits a quick exploration of the signature heatmaps for the genesets of interest by simply clicking on a node; this also triggers the display of some HTML content, if related to a Gene Ontology term.

A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.

In the section below the interactive enrichment map, the functionality for the distillation of gene sets into meta-genesets is provided.
In brief, a community detection algorithm is run on the graph underlying the enrichment map, and the additional information extracted is rendered as a table, linked to a meta-geneset box to display the expression signature for all its member genes.

Bookmarking from the Enrichment Map panel: If the user clicks on the Bookmark button while in this tab, it will add the selected node to the shortlist of genesets, visible later in the Bookmarks panel. The Left Ctrl button of your keyboard can again be used for this purpose, and this will trigger a notification in the lower right part of the screen.

The Overview panel

This panel presents to the user a series of summary plots, based on the outputs of enhance_table() and gs_volcano(), aiming to provide a mean to summarize the most relevant genesets in the res_enrich object. The number of displayed (or labeled, according to each function) genesets can be as usual controlled by the widget in the right sidebar.

We avoid (re)rendering all plots at once by placing them as content of different tabs in a bs4TabCard container - this allows the user to navigate in a more agile way to the desired output elements.

Interactive versions of the plots are returned via calls to ggplotly.

A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.

The GSViz panel

This panel, similarly to the Overview panel, presents a variety of plots visualizing the gene sets - and their mutual relationships, or their behavior across samples. This includes, for example:

As in many other panels, the control over the number of genesets displayed is tweaked with the widget in the right sidebar, "Number of genesets".

A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.

The Bookmarks panel

In this panel, the user can obtain a complete overview on the bookmarked features of interest, both genes (left side) and genesets (right side).

During a live exploration session, it might be tricky to recall the exact aspect of each point of interest.

Below the interactive DT::datatable objects, the content of each table can also be separately downloaded by clicking on the respective button. Moreover, there is a button (with the cocktail icon ) that triggers the generation of a content-rich report which will focus on all the shortlisted elements. This leverages a template report delivered as a part of r BiocStyle::Biocpkg("GeneTonic"), which accesses the input elements and the reactive values for the bookmarks, based on the happy_hour() function.

If a user already knows a set of features and geneset of interest, it is also possible to practically call happy_hour() directly from the command line (see the example in the chunk below). Please keep in mind that the identifiers to be used should match the values used as row names of the dds object and as gs_id column of the original res_enrich input.

happy_hour(dds = dds_macrophage,
           res_de = res_de,
           res_enrich = res_enrich,
           annotation_obj = anno_df,
           project_id = "examplerun",
           mygenesets = res_enrich$gs_id[c(1:5,11,31)],
           mygenes = c("ENSG00000125347",
                       "ENSG00000172399",
                       "ENSG00000137496")
)

This report will serve as a useful means to generate a permanent and reproducible analysis output, that can be further stored or shared.

Ideally, each live session with GeneTonic could terminate with a call to happy_hour() - admittedly, an aperitivo should start with that, and not end - please accept our apologies :) .

A demonstration of how to use this panel effectively is provided by the tour, triggered by clicking on the question circle button () button in the upper right corner of the tab.

In this panel, an export button to a SummarizedExperiment object to be provided as input to the r BiocStyle::Biocpkg("iSEE") package has been added in version >= 1.2.0. This enables further and fine-grained visualization options in the iSEE framework - if the flavors of GeneTonic are not yet what you would want, you might find an excellent venue there.

Using the GeneTonic functions {#functions}

The r BiocStyle::Biocpkg("GeneTonic") package provides, apart from the main app where to perform the analysis in an interactive way, a number of functions which can be used also as standalone components for a workflow.

This sections contains some showcase examples on how to use them.

Overview functions: genes and gene sets

Summarize the top genesets by displaying the logFC of each set's components - and transform the ggplot output into a plotly visualization, where tooltips can deliver additional information:

p <- enhance_table(res_enrich_macrophage,
                   res_macrophage_IFNg_vs_naive,
                   n_gs = 30,
                   annotation_obj = anno_df,
                   chars_limit = 60)
p
library("plotly")
ggplotly(p)

Display the relationship between genesets and genes that belong to them (useful e.g. to see overlap between terms):

gs_alluvial(res_enrich = res_enrich_macrophage,
            res_de = res_macrophage_IFNg_vs_naive,
            annotation_obj = anno_df,
            n_gs = 4)

Display a bipartite graph where genesets and genes are included, and make it interactive:

ggs <- ggs_graph(res_enrich_macrophage,
                 res_de = res_macrophage_IFNg_vs_naive,
                 anno_df,
                 n_gs = 20)
ggs

# could be viewed interactively with
library(visNetwork)
library(magrittr)
ggs %>%
  visIgraph() %>%
  visOptions(highlightNearest = list(enabled = TRUE,
                                     degree = 1,
                                     hover = TRUE),
             nodesIdSelection = TRUE)

Summary representations

Plot an enrichment map of the enrichment results, where you can choose with n_gs the number of top genesets to be displayed, and specify in gs_ids the gene set identifiers - these will be merged together for each function's context:

em <- enrichment_map(res_enrich_macrophage,
                     res_macrophage_IFNg_vs_naive,
                     n_gs = 30,
                     color_by = "z_score",
                     anno_df)
library("igraph")
library("visNetwork")
library("magrittr")

em %>% 
  visIgraph() %>% 
  visOptions(highlightNearest = list(enabled = TRUE,
                                     degree = 1,
                                     hover = TRUE),
             nodesIdSelection = TRUE)

Expanding on the enrichment map, the distilled content of the res_enrich can be represented as enhanced tables and interactive graphs, depicting the membership of the respective meta-genesets

distilled <- distill_enrichment(res_enrich_macrophage,
                                res_macrophage_IFNg_vs_naive,
                                anno_df,
                                n_gs = 60,
                                cluster_fun = "cluster_markov")
DT::datatable(distilled$distilled_table[,1:4])
dim(distilled$distilled_table)
DT::datatable(distilled$res_enrich[,])

dg <- distilled$distilled_em 

library("igraph")
library("visNetwork")
library("magrittr")

# defining a color palette for nicer display
colpal <- colorspace::rainbow_hcl(length(unique(V(dg)$color)))[V(dg)$color]
V(dg)$color.background <- scales::alpha(colpal, alpha = 0.8)
V(dg)$color.highlight <- scales::alpha(colpal, alpha = 1)
V(dg)$color.hover <- scales::alpha(colpal, alpha = 0.5)

V(dg)$color.border <- "black"

visNetwork::visIgraph(dg) %>%
  visOptions(highlightNearest = list(enabled = TRUE,
                                     degree = 1,
                                     hover = TRUE),
             nodesIdSelection = TRUE, 
             selectedBy = "membership")

Another possibility to summarize the result of enrichment is by applying fuzzy clustering on it to detect groups of related genesets, with the peculiarity that a geneset can belong to different clusters - this reflects the original implementation of DAVID [@Huang2009].
The gs_fuzzyclustering() function returns a table with additional columns, related to the cluster id and the status for the geneset in that cluster ("Representative" or "Member", where the Representative geneset is the one with lowest p-value in that cluster).

res_enrich_subset <- res_enrich_macrophage[1:100, ]

fuzzy_subset <- gs_fuzzyclustering(
  res_enrich = res_enrich_subset,
  n_gs = nrow(res_enrich_subset),
  gs_ids = NULL,
  similarity_matrix = NULL,
  similarity_threshold = 0.35,
  fuzzy_seeding_initial_neighbors = 3,
  fuzzy_multilinkage_rule = 0.5)

# show all genesets members of the first cluster
fuzzy_subset[fuzzy_subset$gs_fuzzycluster == "1", ]

# list only the representative clusters
DT::datatable(
  fuzzy_subset[fuzzy_subset$gs_cluster_status == "Representative", ]
)

Display a volcano plot for genesets, plotting significance (y-axis) versus Z score (x-axis), with color and size encoding some geneset features:

gs_volcano(res_enrich_macrophage,
           p_threshold = 0.05,
           color_by = "aggr_score",
           volcano_labels = 10,
           gs_ids = NULL,
           plot_title = "my volcano")
res_enrich_simplified <- gs_simplify(res_enrich_macrophage,
                                     gs_overlap = 0.7)
dim(res_enrich_macrophage)
dim(res_enrich_simplified)
gs_volcano(res_enrich_simplified,
           color_by = "aggr_score")

Plot a dendrogram of the enrichment results (or a subset thereof), using node color, node size, and branch color to encode relevant geneset features (and their groups):

gs_dendro(res_enrich_macrophage,
          n_gs = 50,
          gs_dist_type = "kappa", 
          clust_method = "ward.D2",
          color_leaves_by = "z_score",
          size_leaves_by = "gs_pvalue",
          color_branches_by = "clusters",
          create_plot = TRUE)

Produce a MultiDimensional Scaling plot of the genesets in the enrichment results:

gs_mds(res_enrich_macrophage,
       res_macrophage_IFNg_vs_naive,
       anno_df,
       n_gs = 200,
       gs_ids = NULL,
       similarity_measure = "kappa_matrix",
       mds_k = 2,
       mds_labels = 5,
       mds_colorby = "z_score",
       gs_labels = NULL,
       plot_title = NULL) 

Obtain a simple overview of enrichment results, showing e.g. significance and direction of change (z_score)

gs_summary_overview(res_enrich_macrophage,
                    n_gs = 30,
                    p_value_column = "gs_pvalue",
                    color_by = "z_score")

Plot a summary heatmap of genes vs genesets, encoding the logFC of genes and representing the overlap among genesets:

gs_summary_heat(res_enrich_macrophage,
                res_macrophage_IFNg_vs_naive,
                anno_df,
                n_gs = 15)

Recalling that you can also use a gtl list object as input, a similar call would be:

gs_summary_heat(gtl = gtl,
                n_gs = 10)

Plot a heatmap on the geneset scores, computed sample-wise via gs_scores so that the geneset members compose a Z score that gets summarized:

vst_macrophage <- vst(dds_macrophage)
scores_mat <- gs_scores(
  se = vst_macrophage,
  res_de = res_macrophage_IFNg_vs_naive,
  res_enrich = res_enrich_macrophage,
  annotation_obj = anno_df
)
gs_scoresheat(scores_mat,
              n_gs = 30)

Reporting

Enjoy a fully fledged happy_hour by running offline the analysis contained in the report, included in the r BiocStyle::Biocpkg("GeneTonic") package.

happy_hour(dds = dds_macrophage,
           res_de = res_de,
           res_enrich = res_enrich,
           annotation_obj = anno_df,
           project_id = "examplerun",
           mygenesets = res_enrich$gs_id[c(1:5,11,31)],
           mygenes = c("ENSG00000125347",
                       "ENSG00000172399",
                       "ENSG00000137496")
)

Again, if used with the gtl object, the call would look like this:

happy_hour(gtl = gtl,
           project_id = "examplerun",
           mygenesets = gtl$res_enrich$gs_id[c(1:5,11,31)],
           mygenes = c("ENSG00000125347",
                       "ENSG00000172399",
                       "ENSG00000137496"),
           open_after_creating = TRUE
)

The functionality is also the same activated in the Bookmarks section of the interactive app. Here you can choose a number of genesets (mygenesets) and genes (mygenes), specifying their identifiers, and obtain in batch a number of summary representations focused on these elements.

Advanced users can also provide a custom RMarkdown file, and to guarantee it works in the context of happy_hour, we recommend to inspect the template available here:

template_rmd <- system.file("extdata",
                            "cocktail_recipe.Rmd",
                            package = "GeneTonic")
template_rmd

Comparison between sets

While GeneTonic works currently with a single res_enrich object, there are some functions that allow a visual comparison between two different enrichment results (the functions are displaying a shuffled instance of the single res_enrich):

# generating some shuffled gene sets
res_enrich2 <- res_enrich_macrophage[1:50, ]
set.seed(42)
shuffled_ones <- sample(seq_len(50)) # to generate permuted p-values
res_enrich2$gs_pvalue <- res_enrich2$gs_pvalue[shuffled_ones]
res_enrich2$z_score <- res_enrich2$z_score[shuffled_ones]
res_enrich2$aggr_score <- res_enrich2$aggr_score[shuffled_ones]

gs_summary_overview_pair(res_enrich = res_enrich_macrophage,
                         res_enrich2 = res_enrich2,
                         n_gs = 25)
res_enrich2 <- res_enrich_macrophage[1:42, ]
res_enrich3 <- res_enrich_macrophage[1:42, ]
res_enrich4 <- res_enrich_macrophage[1:42, ]

set.seed(2*42)
shuffled_ones_2 <- sample(seq_len(42)) # to generate permuted p-values
res_enrich2$gs_pvalue <- res_enrich2$gs_pvalue[shuffled_ones_2]
res_enrich2$z_score <- res_enrich2$z_score[shuffled_ones_2]
res_enrich2$aggr_score <- res_enrich2$aggr_score[shuffled_ones_2]

set.seed(3*42)
shuffled_ones_3 <- sample(seq_len(42)) # to generate permuted p-values
res_enrich3$gs_pvalue <- res_enrich3$gs_pvalue[shuffled_ones_3]
res_enrich3$z_score <- res_enrich3$z_score[shuffled_ones_3]
res_enrich3$aggr_score <- res_enrich3$aggr_score[shuffled_ones_3]

set.seed(4*42)
shuffled_ones_4 <- sample(seq_len(42)) # to generate permuted p-values
res_enrich4$gs_pvalue <- res_enrich4$gs_pvalue[shuffled_ones_4]
res_enrich4$z_score <- res_enrich4$z_score[shuffled_ones_4]
res_enrich4$aggr_score <- res_enrich4$aggr_score[shuffled_ones_4]

compa_list <- list(
  scenario2 = res_enrich2,
  scenario3 = res_enrich3,
  scenario4 = res_enrich4
)

gs_horizon(res_enrich_macrophage,
           compared_res_enrich_list = compa_list,
           n_gs = 20,
           sort_by = "clustered")
# with only one set
gs_radar(res_enrich = res_enrich_macrophage)
# with a dataset to compare against
gs_radar(res_enrich = res_enrich_macrophage,
         res_enrich2 = res_enrich2)

In particular, the gs_horizon function tries to group the different genesets based on their relative enrichment across conditions, similar to a representation of [@Dutertre2019] (e.g. Fig. 5G). Thanks to Charlotte Soneson for providing a compact working implementation of this!

Miscellaneous functions

Extract results and sort them (filtering by FDR) with mosdef::deresult_to_df, select one top-ranking gene, extract its expression values and plot them, and display some buttons to link to external databases:

head(mosdef::deresult_to_df(res_macrophage_IFNg_vs_naive))
# to make sure normalized values are available...
dds_macrophage <- estimateSizeFactors(dds_macrophage)   
mosdef::gene_plot(dds_macrophage,
          gene = "ENSG00000125347",
          intgroup = "condition",
          annotation_obj = anno_df,
          plot_type = "auto")
mosdef::gene_plot(dds_macrophage,
          gene = "ENSG00000174944",
          intgroup = "condition",
          assay = "abundance",
          annotation_obj = anno_df,
          plot_type = "auto")

mosdef::geneinfo_to_html("IRF1")

Plot a signature heatmap for a gene set, where a matrix of genes times samples for the set members is extracted, plus generate some HTML code to summarize the information of a Gene Ontology term:

gs_heatmap(se = vst_macrophage,
           res_de = res_macrophage_IFNg_vs_naive,
           res_enrich = res_enrich_macrophage,
           annotation_obj = anno_df,
           geneset_id = "GO:0060337"  ,
           cluster_columns = TRUE,
           anno_col_info = "condition"
)

mosdef::go_to_html("GO:0060337",
                   res_enrich = res_enrich_macrophage)

Plot a signature volcano plot for a gene set, with the genes of the geneset highlighted in color and the remaining genes shown shaded in the background:

signature_volcano(res_de = res_macrophage_IFNg_vs_naive,
                  res_enrich = res_enrich_macrophage,
                  annotation_obj = anno_df,
                  geneset_id = "GO:0060337",
                  FDR = 0.05,
                  color = "#1a81c2"
)

Some functions are just to ensure that the input objects are conform with the format expected by GeneTonic:

res_enrich <- shake_enrichResult(enrichment_results_from_clusterProfiler)
res_enrich <- shake_topGOtableResult(enrichment_results_from_topGOtable)

As an internal check, one can see if all required arguments are fine for running GeneTonic by running checkup_GeneTonic:

checkup_GeneTonic(dds = dds_macrophage,
                  res_de = res_macrophage_IFNg_vs_naive,
                  res_enrich = res_enrich_macrophage,
                  annotation_obj = anno_df)
# if all is fine, it should return an invisible NULL and a simple message

Additional Information {#additionalinfo}

Bug reports can be posted on the Bioconductor support site (https://support.bioconductor.org/) or raised as issues in the GeneTonic GitHub repository (https://github.com/federicomarini/GeneTonic/issues). The GitHub repository also contains the development version of the package, where new functionality is added over time - careful, you might be running bleeding edge versions!

The authors appreciate well-considered suggestions for improvements or new features, or even better, pull requests.

If you use GeneTonic for your analysis, please cite it as shown below:

citation("GeneTonic")

FAQs {#faqs}

Q: Why is this package called GeneTonic?

A: For obvious reasons :) Like in a cocktail, ingredients (expression matrix, results from DE, and results from functional enrichment analysis) do taste better together.
Plus, I like puns with words.
And cocktails.
Sometimes simultaneously.

Q: My configuration on two machines is somewhat different, so I am having difficulty in finding out what packages are different. Is there something to help on this?

A: Yes, you can check out r BiocStyle::Githubpkg("federicomarini/sessionDiffo"), a small utility to compare the outputs of two different sessionInfo outputs. This can help you pinpoint what packages might be causing the issue.

Q: I am using a different service/software for generating the results of functional enrichment analysis. How do I plug this into GeneTonic?

A: You can set up a small conversion function, on the line of shake_topGOtableResult and shake_enrichResult (use the source, Luke to see how you can build your own), or request such a function - make sure to specify the formatting specifications of your tool of choice (e.g. DAVID, g:Profiler, or commercial solutions such as Ingenuity Pathway Analysis).
As of Bioconductor release 3.12 (version >= 1.2.0 for r BiocStyle::Biocpkg("GeneTonic")), shaker functions are available for these tools:

Q: I'd like to try GeneTonic but I could not install it/I just want a quick peek into it. What can I do for this?

A: We set up an instance of GeneTonic running on the macrophage dataset at this address: http://shiny.imbei.uni-mainz.de:3838/GeneTonic.

Session Info {- .smaller}

sessionInfo()

References {-}



federicomarini/GeneTonic documentation built on Oct. 10, 2024, 8:49 p.m.