README.md

geneLists

geneLists is an R package containing a collection of gene lists for simple, reproducible Gene Set Enrichment Analysis (GSEA).

Description

A major focus of this repository is the collection of synaptic proteome genes as well as genes that are implicated in human brain disorders.

~~Gene lists are stored in the Broad Institute GMT format file. These can be downloaded directly, or accessed in R with the data() command.~~ For example, load the SFARI autism candidate gene dataset with data(sfariGene).

Gene lists are are scraped from the literature or online databases. Gene identifiers are mapped to stable, unique Entrez IDs. Often, it is necessary to map human genes to their homlogous mouse genes. This is done using the HomoloGene database and the getHomologs function.

Installation

Insure you have installed AnnotationDbi beforing installing geneLists. For example in R, download geneLists from GitHub using the devtools package:

 install.packages("BiocManager")
 BiocManager::install("AnnotationDbi")

To install the geneLists package in R, use the devtools package:

# Install from github.
devtools::install_github("twesleyb/geneLists")

The gene mapping function getIDs uses organism specific mapping data. Insure you have downloaded the required packages, e.g. for mouse data you should have installed org.Mm.eg.db with BiocManager:

BiocManager::install("org.Mm.eg.db")

Usage

library(geneLists)

# See all available datasets.
geneLists()

# Load a dataset.
data(iPSD)

# converting between identifiers
gphn_proteome <- iPSD[["Gphn"]]
uniprot <- getIDs(gphn_proteome, from="entrez", to="uniprot", species="mouse")

# mapping genes using a given gene map
data(uniprot_map)
mapIDs(uniprot, from="Accession", to="Entrez", gene_map=uniprot_map)

# NOTE: be careful to not confuse getIDs (uses org.##.eg.db) and
# mapIDs (you must provide a gene_map; the arguments from and to specify columns
# in the gene_map).

# to see all scripts in inst/analysis/2_build-lists:
list.files(system.file("analysis/2_build-lists", package="geneLists"))

Datasets

For additional details about each dataset, see the README in the datasets/ directory.

Contributing

Feel free to suggest a useful dataset or database, report an error, or suggest an improvement by submitting an issue or pull request.

The scripts in inst/analysis record how each gene list was created. These can be used as examples to show you can download a dataset and save it as a GMT formatted file and gene_list R object. See the tutorials/064_E3-Ligases.R script for a recent example on how to create a gene list.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.



twesleyb/geneLists documentation built on Oct. 30, 2021, 7:28 p.m.