knitr::opts_chunk$set(crop = NULL)
Annotation packages are available from Bioconductor for a range of model species. Users may browse BiocViews "AnnotationData" on the Bioconductor website or search packages programmatically using the command below.
BiocManager::available("^org\\.")
Here, we load the human gene annotations.
library(org.Hs.eg.db)
Go3AnnDbBimap
objects (from the r Biocpkg("AnnotationDbi")
package) are maps between Entrez gene identifiers and Gene Ontology (GO) identifiers.
Those objects may be directly converted to Sets
objects as demonstrated below.
library(unisets) go_sets <- import(org.Hs.egGO) go_sets
Notice how the "element"
information is typed as EntrezIdVector
, allowing the type of identifier to affect downstream methods (e.g., pathway analyses).
The EntrezIdVector
class directly inherits from the IdVector
class, and benefits of all the methods associated with the parent class.
It is also useful to note that the conversion of Go3AnnDbBimap
Gene Ontology maps to r Githubpkg("kevinrue/unisets")
objects automatically fetches metadata for each GO identifier from the GO.db
package, if installed.
The metadata is stored it in the mcols
(metadata-columns) slot of the setInfo
slot of the object returned.
This metadata can be accessed using the accessor method of the same name.
mcols(setInfo(go_sets))[, c("ONTOLOGY", "TERM")]
We may then visualize the distribution of set sizes, on a log~10~ scale.
library(ggplot2) library(cowplot) ggplot(data.frame(setLengths=setLengths(go_sets))) + geom_histogram(aes(setLengths), bins=100, color="black", fill="grey") + scale_x_log10() + labs(y="Sets", x="Genes")
org.Hs.egGO
is an R object that provides mappings between entrez gene identifiers and the GO identifiers that they are directly associated with.
This mapping and its reverse mapping do NOT associate the child terms from the GO ontology with the gene.
Only the directly evidenced terms are represented here.
In contrast, org.Hs.egGO2ALLEGS
is an R object that provides mappings between a given GO identifier and all of the Entrez Gene identifiers annotated at that GO term OR TO ONE OF IT'S CHILD NODES in the GO ontology.
Thus, this mapping is much larger and more inclusive than org.Hs.egGO2EG
.
Below, we use the length
method to show the number of relations between genes and GO terms imported from the org.Hs.egGO2ALLEGS
map.
go_sets <- import(org.Hs.egGO2ALLEGS) format(length(go_sets), big.mark=",")
We can also examine the count of relations associated with each evidence code in each Gene Ontology namespace.
ggplot(as.data.frame(go_sets)) + geom_bar(aes(evidence)) + facet_wrap(~ontology, ncol = 1) + coord_flip() + # scale_y_continuous(labels = function(x){ format(as.integer(x), big.mark = ",") }) + scale_y_continuous(labels = scales::comma) + theme(axis.text.y = element_text(size=rel(0.7)))
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.