geneLists is an R package containing a collection of gene lists for simple, reproducible Gene Set Enrichment Analysis (GSEA).
A major focus of this repository is the collection of synaptic proteome genes as well as genes that are implicated in human brain disorders.
~~Gene lists are stored in the Broad Institute GMT format file.
These can be downloaded directly, or accessed in R with the data()
command.~~
For example, load the SFARI autism candidate gene dataset
with data(sfariGene)
.
Gene lists are are scraped from the literature or online databases. Gene
identifiers are mapped to stable, unique Entrez IDs. Often, it is
necessary to map human genes to their homlogous mouse genes. This is done using
the HomoloGene database and the getHomologs
function.
Insure you have installed AnnotationDbi
beforing installing geneLists
.
For example in R, download geneLists
from GitHub using the devtools package:
install.packages("BiocManager")
BiocManager::install("AnnotationDbi")
To install the geneLists
package in R, use the devtools
package:
# Install from github.
devtools::install_github("twesleyb/geneLists")
The gene mapping function getIDs
uses organism specific mapping data. Insure
you have downloaded the required packages, e.g. for mouse data you should have
installed org.Mm.eg.db
with BiocManager:
BiocManager::install("org.Mm.eg.db")
library(geneLists)
# See all available datasets.
geneLists()
# Load a dataset.
data(iPSD)
# converting between identifiers
gphn_proteome <- iPSD[["Gphn"]]
uniprot <- getIDs(gphn_proteome, from="entrez", to="uniprot", species="mouse")
# mapping genes using a given gene map
data(uniprot_map)
mapIDs(uniprot, from="Accession", to="Entrez", gene_map=uniprot_map)
# NOTE: be careful to not confuse getIDs (uses org.##.eg.db) and
# mapIDs (you must provide a gene_map; the arguments from and to specify columns
# in the gene_map).
# to see all scripts in inst/analysis/2_build-lists:
list.files(system.file("analysis/2_build-lists", package="geneLists"))
For additional details about each dataset, see the README
in the datasets/
directory.
Feel free to suggest a useful dataset or database, report an error, or suggest an improvement by submitting an issue or pull request.
The scripts in inst/analysis
record how each gene list was created. These
can be used as examples to show you can download a dataset and save it as a GMT
formatted file and gene_list R object. See the tutorials/064_E3-Ligases.R
script for a recent example on how to create a gene list.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.