knitr::opts_chunk$set(echo=TRUE);
This document is intended to track the workflow steps involved in a typical MultiEnrichMap workflow. It also includes some details about custom column headers and how and where in the workflow they are handled.
The starting data should include results from gene set enrichment
analysis, in the form of a data.frame, an enrichResult
object,
or another class from which a data.frame can be coerced.
Each enrichment result is provided in its own data.frame or similar class.
enrichDF2enrichResult()
converts a single data.frame to enrichResult
.
keyColname
- unique identifier for each rowgeneColname
- column with delimited list of genes, e.g. GeneA,GeneB,GeneCpathGenes
- column reporting the number of pathway genes tested, e.g.
the number of pathway genes present in the gene universe.geneHits
- column with the number of pathway genes present in the
test genes.pvalueColname
- column with the enrichment P-value to use in downstream
analysis steps. E.g. if there is a P-value and FDR column, choose the one
column to use in subsequent steps.geneDelim
- by default "," is the delimiter between genes, but it can
be defined here.geneRatioColname
- if the geneHits
and pathGenes
columns are not
available, sometimes there is a column reporting "geneHits/pathGenes"
referred to as a "geneRatio" or "BgRatio".msigdbGmtT
- optional arules format for gene set data, when provided
the import will choose these genes to represent each pathway. Otherwise,
pathway genes are only inferred using the reported gene hits.pAdjustMethod
- optionally apply a P-value adjustment, which may be
helpful when subsetting pathways by some pre-defined, independent criteria,
e.g. by pathway source.This function returns enrichResult
object, with close approximation
to the data produced in the clusterProfiler
package, but not
identical. For example, the pathway genes will not be fully correct
without supplying the GmtT data upfront, since the GmtT data will
contain the full set of pathway genes for all pathways.
To run multiEnrichMap()
one must supply a list of enrichment results,
either in the form of data.frame or enrichResult
objects.
enrichLabels
are supplied or derived from names(enrichList)
,
these labels are used in subsequent analysis steps.colorV
or
derived using colorjam::rainbowJam()
. These colors are used in
enrichment heatmaps.geneHitList
is supplied or derived, representing a list of vectors,
where each vector is the set of gene hits tested for enrichment
corresponding to enrichList
. Note that not all enrichment results
will represent every gene hit, so deriving this data will be imperfect.topEnrichBySource()
which subsets each enrichment
based upon a set of sources, and takes the top N number of pathways
from each source, from each enrichment. Once the pathways are defined,
then all results for these pathways are retained for downstream analysis.geneIM
incidence matrix of gene hits by enrichment result is derived.
This data allows comparison of gene hits across enrichment tests.geneIMcolors
is a color matrix used to create a heatmap showing the
geneIM
data.enrichIM
incidence matrix of pathway enrichment P-values by enrichment
test, to allow comparison of pathway enrichment results.enrichIMcolors
is a color matrix used to create a heatmap showing the
enrichIM
data. These colors are based upon the enrichment P-value.
Note enrichList2IM()
is called during this step.
valueColname
uses pvalueColname
so the matrix will be filled with
the actual P-value.keyColname
uses nameColname
so the matrix rownames will use pathway
names and not the pathway ID.Optionally GmtT
is used here.
enrichIMgeneCount
is similar to enrichIM
except that the gene count
is reported in the matrix instead of P-value.
valueColname
uses geneCountColname
enrichIM
is filtered to require at least one enrichment P-value
below cutoffRowMinP
in each row.
i1use
(internal) is defined as the set of pathways meeting the criteria
enrichList
data is subsetted to include only pathways present in
enrichIM
.enrichIM
, enrichIMM
, enrichIMcolors
are subsetted to include
only these pathways.
enrichList
is combined into one data.frame using enrichList2df()
keyColname
, geneColname
, geneCountColname
, pvalueColname
are
used at this step, to define the core components required
GmtT
is optionaldescriptionColname
is optionally used, sent to memAdjustLabel()
to
clean up some annotations
enrichDF2enrichResult()
is called on the merged enrichment data.frame
geneHits
, pathGenes
, keyColname
, geneColname
, geneCountColname
,
geneDelim
, pvalueColname
are passed to this function
GmtT
is optionally used at this step
memIM
is an incidence matrix of genes by pathways, using the merged
enrichment results after filtering.
enrichMapJam()
is called using the merged enrichResults data.igraph2pieGraph()
is called to convert an igraph enrichMap object to
use pie node shapes.rectifyPiegraph()
is called to enable coloredrectangle node shapes.cnetplotJam()
is called to create a full cnet igraph object
geneCountColname
is used to size pathway nodes by the number of genes
nodeLabel
uses the first matching entry from nameColname
,
descriptionColname
, keyColname
, or "ID"
."nodeType"
is defined in the igraph object, to identify nodes as
"Set"
or "Gene"
.igraph2pieGraph()
is called two times, to enable pie node shapes.
enrichIMcolors
is used to colorize pathway nodes
geneIMcolors
is used to colorize gene nodes
rectifyPiegraph()
is called to enable coloredrectangle node shapes.
Output is returned as a list with the following items:
colorV
the named color vector used to apply colors to each enrichment
result.geneHitList
the gene hits tested in each enrichment, either supplied
or derived.geneIM
incidence matrix of gene hits per enrichment test.geneIMcolors
the color-coded matrix derived from geneIM
.enrichIMgeneCount
incidence matrix of pathways, scored using the
gene count per pathway.enrichList
list of enrichment results after filteringenrichIM
incidence matrix of pathways, scored by enrichment P-valuesenrichIMcolors
the color-coded matrix derived from enrichIM
.multiEnrichDF
the data.frame of combined enrichment results, after
filtering of pathways for at least one significant P-value.multiEnrichER
same as multiEnrichDF
except an object of class
enrichResult
.memIM
is an incidence matrix of genes by pathways, using the merged
enrichment results after filtering.multiEnrichMap
the enrichMap results in the form of igraph objectmultiEnrichMap2
the enrichMap object encoded to use either pie or
coloredrectangle
node shape.multiCnetPlot
the cnet plot igraph objectmultiCnetPlot1
is the cnet plot with pathway pie node shapesmultiCnetPlot1b
is the cnet plot with pathway and gene pie node shapesmultiCnetPlot2
is the cnet plot using coloredrectangle node shapescolnames
vector of actual colnames used, including geneColname
,
keyColname
, nameColname
, descriptionColname
, pvalueColname
.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.