makeEnsemblDbPackage: Generating a Ensembl annotation package from Ensembl

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

These functions allow to retrieve annotations from the Ensembl database (fetchTablesFromEnsembl) create an SQLite database from these (makeEnsemblSQLiteFromTables) and to generate an annotation package providing access to this resource (makeEnsembldbPackage).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
ensDbFromGRanges(x, outfile, path, organism, genomeVersion,
                 version, verbose=FALSE)

ensDbFromGtf(gtf, outfile, path, organism, genomeVersion,
             version, verbose=FALSE)

fetchTablesFromEnsembl(version, ensemblapi, user="anonymous",
                       host="ensembldb.ensembl.org", pass="",
                       port=5306, species="human")

makeEnsemblSQLiteFromTables(path=".", dbname)

makeEnsembldbPackage(ensdb, version, maintainer, author,
                     destDir=".", license="Artistic-2.0")

Arguments

(in alphabetical order)

author

The author of the package.

dbname

The name for the database (optional). By default a name based on the species and Ensembl version will be automatically generated (and returned by the function).

destDir

Where the package should be saved to.

ensdb

The file name of the SQLite database generated by makeEnsemblSQLiteFromTables.

ensemblapi

The path to the Ensembl perl API installed locally on the system. The Ensembl perl API version has to fit the version.

genomeVersion

For ensDbFromGtf: the version of the genome (e.g. "GRCh37"). If not provided the function will try to guess it from the file name (assuming file name convention of Ensembl GTF files).

gtf

The GTF file name.

host

The hostname to access the Ensembl database.

license

The license of the package.

maintainer

The maintainer of the package.

organism

For ensDbFromGtf: the organism name (e.g. "Homo_sapiens"). If not provided the function will try to guess it from the file name (assuming file name convention of Ensembl GTF files).

outfile

The desired file name of the SQLite file. If not provided the name of the GTF file will be used.

pass

The password for the Ensembl database.

path

The directory in which the tables retrieved by fetchTablesFromEnsembl or the SQLite database file generated by ensDbFromGtf are stored.

port

The port to be used to connect to the Ensembl database.

species

The species for which the annotations should be retrieved.

user

The username for the Ensembl database.

verbose

If progress messages should be shown.

version

For fetchTablesFromEnsembl, ensDbFromGRanges and ensDbFromGtf: the Ensembl version for which the annotation should be retrieved (e.g. 75). The ensDbFromGtf function will try to guess the Ensembl version from the GTF file name if not provided.

For makeEnsemblDbPackage: the version for the package.

x

For ensDbFromGRanges: the GRanges object.

Details

The fetchTablesFromEnsembl function internally calls the perl script get_gene_transcript_exon_tables.pl to retrieve all required information from the Ensembl database using the Ensembl perl API.

As an alternative way, a EnsDb database file can be generated by the ensDbFromGtf from a GTF file from Ensembl or with the ensDbFromGRanges from a GRanges object e.g. retrieved from the AnnotationHub package. The returned database file name can then be used as an input to the makeEnsembldbPackage.

Value

makeEnsemblSQLiteFromTables, ensDbFromGRanges and ensDbFromGtf: the name of the SQLite file.

Note

A local installation of the Ensembl perl API is required for the fetchTablesFromEnsembl. See http://www.ensembl.org/info/docs/api/api_installation.html for installation inscructions.

A database generated from a GTF file lacks some features as they are not available in the GTF files from Ensembl. These are: chromosome lengths, NCBI Entrezgene IDs.

GRanges objects provided by the AnnotationHub package on the other hand contain already the required seqinfo information (chromosome length etc) to build an EnsDb database using the ensDbFromGRanges.

Author(s)

Johannes Rainer

See Also

EnsDb, genes

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
## Not run: 

## get all human gene/transcript/exon annotations from Ensembl (75)
## the resulting tables will be stored by default to the current working
## directory; if the correct Ensembl api (version 75) is defined in the
## PERL5LIB environment variable, the ensemblapi parameter can also be omitted.
fetchTablesFromEnsembl(75,
                       ensemblapi="/home/bioinfo/ensembl/75/API/ensembl/modules",
                       species="human")

## These tables can then be processed to generate a SQLite database
## containing the annotations
DBFile <- makeEnsemblSQLiteFromTables()

## and finally we can generate the package
makeEnsembldbPackage(ensdb=DBFile, version="0.0.1",
                     maintainer="Johannes Rainer <johannes.rainer@eurac.edu>",
                     author="J Rainer")

## Build an annotation file from a GTF file.
## the GTF file can be downloaded from
## ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/
gtffile <- "Homo_sapiens.GRCh37.75.gtf.gz"
## generate the SQLite database file
DB <- ensDbFromGtf(gtf=paste0(ensemblhost, gtffile), verbose=TRUE)

## load the DB file directly
EDB <- EnsDb(DB)


## End(Not run)

## Generate a sqlite database for genes encoded on chromosome Y
chrY <- system.file("chrY", package="ensembldb")
DBFile <- makeEnsemblSQLiteFromTables(path=chrY ,dbname=tempfile())
## load this database:
edb <- EnsDb(DBFile)

edb

## Generate a sqlite database from a GRanges object specifying
## genes encoded on chromosome Y
load(system.file("YGRanges.RData", package="ensembldb"))

Y

DB <- ensDbFromGRanges(Y, path=tempdir(), version=75,
                       organism="Homo_sapiens")
edb <- EnsDb(DB)

jotsetung/ensembldb-old documentation built on May 19, 2019, 9:41 p.m.