knitr::opts_chunk$set(comment="", cache=FALSE, fig.align="center") devtools::load_all(".") library(tidyverse) library(qusage) library(igraph)
Genesets are simply a named list of character vectors which can be directly passed to hyper()
. Alternatively, one can pass a gsets
object, which can retain the name and version of the genesets one uses. This versioning will be included when exporting results or generating reports, which will ensure your results are reproducible.
genesets <- list("GSET1" = c("GENE1", "GENE2", "GENE3"), "GSET2" = c("GENE4", "GENE6"), "GSET3" = c("GENE7", "GENE8", "GENE9"))
Creating a gsets
object is easy...
genesets <- gsets$new(genesets, name="Example Genesets", version="v1.0") print(genesets)
And can be passed directly to hyper()
...
hypeR(signature, genesets)
To aid in workflow efficiency, hypeR enables users to download genesets, wrapped as gsets
objects, from multiple data sources.
Most researchers will find the genesets hosted by msigdb are adequate to perform geneset enrichment analysis. There are various types of genesets available across multiple species.
msigdb_info()
Here we download the Hallmarks genesets...
HALLMARK <- msigdb_gsets(species="Homo sapiens", category="H") print(HALLMARK)
We can also clean them up by removing the first leading common substring...
HALLMARK <- msigdb_gsets(species="Homo sapiens", category="H", clean=TRUE) print(HALLMARK)
This can be passed directly to hypeR()
...
hypeR(signature, genesets=HALLMARK)
Other commonly used genesets include Biocarta, Kegg, and Reactome...
BIOCARTA <- msigdb_gsets(species="Homo sapiens", category="C2", subcategory="CP:BIOCARTA") KEGG <- msigdb_gsets(species="Homo sapiens", category="C2", subcategory="CP:KEGG") REACTOME <- msigdb_gsets(species="Homo sapiens", category="C2", subcategory="CP:REACTOME")
If msigdb genesets are not sufficient, we have also provided another set of functions for downloading and loading other publicly available genesets. This is facilitated by interfacing with the publicly available libraries hosted by enrichr.
available <- enrichr_available() reactable(available)
ATLAS <- enrichr_gsets("Human_Gene_Atlas") print(ATLAS)
Note: These libraries do not have a systematic versioning scheme, however the date downloaded will be recorded.
Additionally download other species if you aren't working with human or mouse genes!
yeast <- enrichr_gsets("GO_Biological_Process_2018", db="YeastEnrichr") worm <- enrichr_gsets("GO_Biological_Process_2018", db="WormEnrichr") fish <- enrichr_gsets("GO_Biological_Process_2018", db="FishEnrichr") fly <- enrichr_gsets("GO_Biological_Process_2018", db="FlyEnrichr")
When dealing with hundreds of genesets, it's often useful to understand the relationships between them. This allows researchers to summarize many enriched pathways as more general biological processes. To do this, we rely on curated relationships defined between them. For example, Reactome conveniently defines their genesets in a hiearchy of pathways. This data can be formatted into a relational genesets object called rgsets
.
We currently curate some relational genesets for use with hypeR and plan to add more continuously.
hyperdb_info()
Downloading relational genesets is easy...
genesets <- hyperdb_rgsets("REACTOME", "70.0")
And can be passed directly to hyper()
...
hypeR(signature, genesets)
We try to provide relational genesets for popular databases that include hierarchical information. For users who want to create their own, we provide this example.
Raw data for gsets, nodes, and edges can be directly downloaded.
genesets.url <- "https://reactome.org/download/current/ReactomePathways.gmt.zip" nodes.url <- "https://reactome.org/download/current/ReactomePathways.txt" edges.url <- "https://reactome.org/download/current/ReactomePathwaysRelation.txt"
# Genesets genesets.tmp <- tempfile(fileext=".gmt.zip") download.file(genesets.url, destfile = genesets.tmp, mode = "wb") genesets.raw <- genesets.tmp %>% unzip() %>% read.gmt() %>% lapply(function(x) { toupper(x[x != "Reactome Pathway"]) }) # Nodes nodes.raw <- nodes.url %>% read.delim(sep="\t", header=FALSE, fill=TRUE, col.names=c("id", "label", "species"), stringsAsFactors=FALSE) # Edges edges.raw <- edges.url %>% read.delim(sep="\t", header=FALSE, fill=TRUE, col.names=c("from", "to"), stringsAsFactors=FALSE)
# Species-specific nodes nodes <- nodes.raw %>% dplyr::filter( label %in% names(genesets.raw) ) %>% dplyr::filter( species == "Homo sapiens" ) %>% dplyr::filter(! duplicated(id) ) %>% magrittr::set_rownames( .$id ) %>% { .[, "label", drop=FALSE] } # Species-specific edges edges <- edges.raw %>% dplyr::filter( from %in% rownames(nodes) ) %>% dplyr::filter( to %in% rownames(nodes) ) # Leaf genesets genesets <- nodes %>% rownames() %>% .[! . %in% edges$from] %>% sapply( function(x) nodes[x, "label"] ) %>% genesets.raw[.]
A single-column data frame of labels where the rownames are unique identifiers. Leaf node labels should have an associated geneset, while internal nodes do not have to. The only genesets tested, will be those in the list of genesets.
head(nodes)
A dataframe with two columns of identifiers, indicating directed edges between nodes in the hierarchy.
head(edges)
A list of character vectors, named by the geneset labels. Typically, genesets will be at the leaves of the hierarchy, while not required.
head(names(genesets))
rgsets
Objectgenesets <- rgsets$new(genesets, nodes, edges, name="REACTOME", version="v70.0") print(genesets)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.