mem_plot_folio | R Documentation |
Multienrichment folio of summary plots
mem_plot_folio(
mem,
do_which = NULL,
p_cutoff = NULL,
p_floor = 1e-10,
main = "",
use_raster = TRUE,
min_gene_ct = 1,
min_set_ct = 1,
min_set_ct_each = 4,
column_method = "euclidean",
row_method = "euclidean",
exemplar_range = c(1, 2, 3),
pathway_column_split = NULL,
pathway_column_title = LETTERS,
gene_row_split = NULL,
gene_row_title = letters,
edge_color = NULL,
cex.main = 2,
cex.sub = 1.5,
row_cex = 1,
column_cex = 1,
max_labels = 4,
max_nchar_labels = 25,
include_cluster_title = TRUE,
repulse = 4,
use_shadowText = FALSE,
color_by_column = FALSE,
style = "dotplot_inverted",
enrich_im_weight = 0.3,
gene_im_weight = 0.5,
colorize_by_gene = TRUE,
cluster_color_min_fraction = 0.4,
byCols = c("composite_rank", "minp_rank", "gene_count_rank"),
edge_bundling = "connections",
apply_direction = NULL,
rotate_heatmap = FALSE,
row_anno_padding = grid::unit(3, "mm"),
column_anno_padding = grid::unit(3, "mm"),
do_plot = TRUE,
verbose = FALSE,
...
)
mem |
|
do_which |
integer vector of plots to produce. When |
p_cutoff |
numeric value indicating the enrichment P-value threshold
used for |
p_floor |
numeric value indicating the lowest enrichment P-value used in the color gradient on the Enrichment Heatmap. |
main |
character string used as a title on Cnet plots. |
use_raster |
logical indicating whether to use raster heatmaps,
passed to |
min_gene_ct , min_set_ct |
integer values passed to
|
min_set_ct_each |
minimum number of genes required for each set, required for at least one enrichment test. |
column_method , row_method |
arguments passed to
|
exemplar_range |
integer vector (or |
pathway_column_split , gene_row_split |
|
pathway_column_title , gene_row_title |
|
cex.main , cex.sub |
numeric values passed to |
row_cex , column_cex |
|
color_by_column |
|
enrich_im_weight , gene_im_weight |
|
colorize_by_gene |
|
cluster_color_min_fraction |
|
byCols |
|
edge_bundling |
|
apply_direction |
|
rotate_heatmap |
|
row_anno_padding , column_anno_padding |
|
do_plot |
|
verbose |
|
... |
additional arguments are passed to downstream functions. Some useful examples:
|
This function is intended to create multiple summary plots
using the output data from multiEnrichMap()
. By default
it creates all plots one by one, sufficient for including
in a multi-page PDF document with cairo_pdf(..., onefile=TRUE)
or pdf(..., onefile=TRUE)
.
The data for each plot object can be created and visualized later
with argument do_plot=FALSE
.
Note: Since version 0.0.76.900
the first step in the workflow is
to cluster the underlying gene-pathway incidence matrix.
This step defines a consistent dendrogram driven by underlying
gene content in each pathway.
The dendrogram is used by each subsequent plot
including the enrichment heatmap.
There are two recommended strategies for visualizing multienrichment results:
Pathway clusters viewed as a concept network (Cnet) plot.
Given numerous statistically enriched pathways, this process defines pathway clusters using the underlying gene-pathway incidence matrix.
Within each pathway cluster, the pathways typically share a high proportion of the same genes, and therefore are expected to represent very similar functions. Ideally, each cluster represents some distinct biological function, or a functional theme.
Benefit: Reducing a large number of pathways to a small number of clusters greatly improves the options for visualization, while retaining a comprehensive view of all genes and pathways involved.
Benefit: This option is recommended when there are numerous pathways, and when including more pathways is beneficial to understanding the overall functional effects of the experimental study.
Limitation: The downside with this approach is that sometimes this comprehensive content can be too much detail to interpret in one figure, overshadowing individual pathways in each cluster.
Limitation: It may be difficult to recognize a functional theme for each pathway cluster, unfortunately that process is not (yet) automated and requires some domain expertise of the pathways and functions involved.
Limitation: It may not be possible for one Cnet plot to represent all functional effects of an experimental study.
Exemplar pathways are viewed as a Cnet plot.
As described above, given numerous statistically enriched pathways, pathways are clustered using the gene-pathway incidence matrix. One "exemplar" pathway is selected from each cluster to represent the typical pathway content in each cluster, usually the most significant pathway in the cluster, but optionally the pathway containing the most total genes.
Benefit: This process can produce a cleaner figure than Option 1 PathwayClusters, because fewer pathways and their associated genes are included in the figure.
Limitation: This cleaner figure is understandably somewhat less comprehensive, and may be subject to bias when selecting exemplar pathways. However the selection of relevant pathways may be very effective within the context of the experimental study.
Benefit: The resulting Cnet plot can often improve focus on specific genes and pathways, which can be advantageous when including numerous "synonyms" for the same or similar pathways is not beneficial.
Benefit: This strategy also works particularly well when there are
relatively few enriched pathways, or when argument topEnrichN
used
with multiEnrichMap()
was relatively small.
The folio of plots includes:
Enrichment Heatmap (Plot #1), enrichment P-values created using
mem_enrichment_heatmap()
. Note that by default, the Gene-Pathway
incidence matrix is also created (invisibly) in order to define
consistent pathway clusters.
Output list name: "enrichment_hm"
Gene-Pathway Incidence Matrix Heatmap (Plot #2) is created
using mem_gene_path_heatmap()
.
This step defines and visualizes the pathway clustering used by all
plots in the folio.
Output list name: "gp_hm"
Cnet Cluster Plot (Plots #3,#4,#5) creates a collapsed
Concept network (Cnet) of Genes with Pathway clusters,
using collapse_mem_clusters()
, then plotted with jam_igraph()
.
Plot #3 labels the pathway clusters with the first N pathways.
Output list name: "cnet_collapsed"
Plot #4 labels the pathway clusters with LETTERS.
This file is typically used for other plots.
Output list name: "cnet_collapsed_set"
Plot #5 hides all gene labels.
Output list name: "cnet_collapsed_set2"
Cnet Exemplar Plots (Plots #6,#7,#8) creates smaller pathway
Cnet plots, as opposed to pathway-cluster Cnets in #3,#4,#5 above,
using exemplar pathways from each gene-pathway cluster.
Output list name: "cnet_exemplars"
with a list
of igraph
objects:
Plots #6 includes one exemplar pathway per pathway cluster.
Plots #7 includes two exemplar pathways per pathway cluster.
Plots #8 includes three exemplar pathways per pathway cluster.
Cnet Individual Cluster Plots (Plots #9,#10,#11,etc.) create one
pathway Cnet plot per individual pathway cluster, showing only
the pathways in that cluster. The number of plots are defined by
the number of pathway cluters, usually pathway_column_split
.
These plots may be useful to explore pathways in detail within each
pathway cluster, for example when there are many pathways which are
not well-defined for a particular pathway cluster in the Gene-Pathway
heatmap.
Output list name "cnet_clusters"
The specific plots to be created are controlled with do_which
:
do_which=1
will create the enrichment heatmap.
do_which=2
will create the gene-pathway heatmap.
do_which=3
will create the Cnet Cluster Plot using
pathway cluster labels for each pathway node, by default it uses LETTERS
:
"A", "B", "C", "D"
, etc.
do_which=4
will create the Cnet Cluster Plot using abbreviated
pathway labels for each pathway cluster node.
do_which=5
will create the Cnet Cluster Plot with no node labels.
do_which=6
begins the series of Cnet Exemplar Plots for each value
in argument exemplar_range
, whose default is c(1, 2, 3)
.
do_which=9
(by default) begins the series of Cnet individual
cluster plots, which includes all pathways from each cluster.
The most frequently used plots are do_which=2
for the
gene-pathway heatmap, and do_which=4
for the collapsed Cnet
plot, where Cnet clusters are based upon the gene-pathway heatmap.
Arguments p_cutoff
and min_set_ct_each
can be used to
apply more stringent thresholds than the original mem
data.
For example, applying p_cutoff=0.05
during multiEnrichMap()
will colorize pathways in mem$enrichIMcolors
, however when
calling mem_plot_folio()
with p_cutoff=0.001
will use blank
color in the color gradient for pathways that do not
have mem$enrichIM
value at or below 0.001
.
Our experience is that the pathway clustering does not need to be perfect to be useful and valid. The pathway clusters are valid based upon the parameters used for clustering, and provide insight into the genes that help define each cluster distinct from other clusters. Sometimes the clustering results are more or less effective based upon the type of pattern observed in the data, so it can be helpful to adjust parameters to drill down to the most effective patterns.
list
is returned via invisible()
, which contains each
plot object enabled by the argument do_which
:
enrichment_hm
is a Heatmap object from ComplexHeatmap
that contains the enrichment P-value heatmap. Note that this
data is not used directly in subsequent plots, the pathway
clusters shown here are based upon -log10(Pvalue)
and not
the underlying gene content of each pathway. This plot is
a useful overview that answers the question "How many
pathways are significantly enriched across the different
enrichment tests?"
gp_hm
is a Heatmap object from ComplexHeatmap
with
the gene-pathway incidence matrix heatmap. This heatmap and
the column/pathway clusters are the subject of subsequent
Cnet plots.
gp_hm_caption
is a text caption that describes the gene
and set filter criteria, and the row and column distance methods
used for clustering. Because the filtering and clustering
options have substantial impact on clustering, and the
pathway clusters are the key for all subsequent plots,
these values are important to keep associated with the
output of this function.
clusters_mem
is a list
with the pathways contained
in each pathway cluster shown by the gene-pathway heatmap,
obtained by heatmap_column_order(gp_hm)
. The pathway names
should also be present in colnames(mem$memIM)
and
rownames(mem$enrichIM)
, for follow-up inspection.
cnet_collapsed
is an igraph
object with Cnet plot data,
where the pathways have been collapsed by cluster, using the
gene-pathway heatmap clusters defined in clusters_mem
. Each
pathway cluster is labeled by cluster name, and the first few
pathway names.
This data can be plotted using jam_igraph(cnet_collapsed)
.
cnet_collapsed_set
is the same as cnet_collapsed
except the
pathways are labeled by the cluster name only, for example
c("A", "B", "C", "D")
.
This data can be plotted using jam_igraph(cnet_collapsed_set)
.
cnet_collapsed_set2
is the same as cnet_collapsed_set
except the
gene labels are hidden, useful when there are too many genes to label
clearly. The gene symbols are still stored in V(g)$name
but the labels
in V(g)$label
are updated to hide the genes.
This data can be plotted using jam_igraph(cnet_collapsed_set2)
.
cnet_exemplars
is a list
of igraph
Cnet objects, each
one contains only the number of exemplar pathways from each cluster
defined by argument exemplar_range
. By default it uses 1
exemplar
per cluster, then 2
exemplars per cluster, then 3
exemplars
per cluster. A number of published figures use 1
exemplar per
pathway cluster.
This data can be plotted using jam_igraph(cnet_exemplars[[1]])
,
which will plot only the first igraph
object from the list.
cnet_clusters
is a list
of igraph
Cnet objects, each one
contains all the pathways in one pathway cluster.
This data can be plotted using jam_igraph(cnet_clusters[[1]])
,
or by calling a specific cluster jam_igraph(cnet_clusters[["A"]])
.
The clustering is performed by combining the gene-pathway incidence
matrix mem$memIM
with the -log10(mem$enrichIM)
enrichment P-values.
The relative weight of each matrix is controlled by
enrich_im_weight
, where enrich_im_weight=0
assigns weight=0
to the enrichment P-values, and thus clusters only using the
gene-pathway matrix. Similarly, enrich_im_weight=1
will assign
full weight to the enrichment P-value matrix, and will ignore
the gene-pathway matrix data.
The corresponding weight for gene (rows) is controlled by
gene_im_weight
, which balances row clustering with the
mem$geneIM
matrix, and the gene-pathway matrix mem$memIM
.
The argument column_method
defines the distance method,
for example "euclidean"
and "binary"
are two immediate choices.
The method also adds "correlation"
from amap::hcluster()
which
can be very useful especially with large datasets.
The number of pathway clusters is controlled by
pathway_column_split
, by default when pathway_column_split=NULL
and auto_cluster=TRUE
the number of clusters is defined based
upon the total number of pathways. In practice, pathway_column_split=4
or pathway_column_split=3
is recommended, as this number of
clusters is most convenient to visualize as a Cnet plot.
To define your own pathway cluster labels, define pathway_column_title
as a vector with length equal to pathway_column_split
. These labels
become network node labels in subsequent plots, and in the
resulting igraph
object.
The pathway clusters are dependent upon the genes and pathways
used during clustering, which are also controlled by
min_set_ct
and min_gene_ct
.
min_set_ct
filters the matrix by the number of times a Set is
represented in the matrix,
which can be helpful when there are pathways with large number of
genes, with some pathways with very low number of genes.
min_gene_ct
filters the matrix by the number of times a gene is
represented in the matrix. It can be helpful for requiring a gene
be represented in more than one enriched pathway.
min_set_ct_each
filters the matrix to require each Set to
contain at least this many entries from one enrichment result,
rather than using the combined incidence matrix. It is mostly
helpful to increase the value used in multiEnrichMap()
argument
min_count
, which already filters pathways for minimum number
of genes involved.
Note: These filters are only recommended when the gene-pathway matrix is very large, perhaps 100 pathways, or 500 genes.
The resulting Cnet pathway clusters are single nodes in the
network, and these nodes are colorized based upon the enrichment
tests involved. The threshold for including the color for
each enrichment test is defined by cluster_color_min_fraction
,
which requires at least this fraction of pathways in a
pathway cluster meets the significance criteria for that
enrichment test.
To adjust the coloration filter to include any enrichment
test with at least one significant result, use
cluster_color_min_fraction=0.01
.
In the gene-pathway heatmap,
these colors are shown across the top of the heatmap.
The default cluster_color_min_fraction=0.4
requires 40%
of pathways in a cluster for each enrichment test.
Note: Prior to version 0.0.76.900
the enrichment heatmap was clustered only using enrichment
P-values, transformed with log10(Pvalue)
. The clustering was
inconsistent with other plots in the folio, and was not effective
at clustering pathways based upon similar content, which is the
primary goal of the multienrichjam
R package.
Other jam plot functions:
adjust_polygon_border()
,
grid_with_title()
,
jam_igraph()
,
mem_enrichment_heatmap()
,
mem_gene_path_heatmap()
,
mem_legend()
,
mem_multienrichplot()
,
plot_layout_scale()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.