getAnnotation: Annotation downloader
In pmoulos/sitadela: An R package for the easy provision of simple but complete tab-delimited genomic annotation from a variety of sources and organisms

getAnnotation

R Documentation

Annotation downloader

Description

For Ensembl based annotations, this function connects to the EBI's Biomart service using the package biomaRt and downloads annotation elements (gene co-ordinates, exon co-ordinates, gene identifications, biotypes etc.) for each of the supported organisms. For UCSC/RefSeq annotations, it connects to the respective UCSC SQL databases if the package RMySQL is present, otherwise it downloads flat files and build a temporary SQLite database to make the necessary build queries. Gene and transcript versions can be attached (when available) using the tv argument. This is very useful when transcript versioning is required, such as several precision medicine applications.

Usage

    getAnnotation(org, type, refdb = "ensembl", ver = NULL,
        tv = FALSE, rc = NULL)

Arguments

`org`	the organism for which to download annotation (one of the supported ones, see Details).
`type`	the transcriptional unit annotation level to load. It can be one of `"gene"` (default), `"transcript"`, `"utr"`, `"transexon"`, `"transutr"`, `"exon"`. See Details for further explanation of each option.
`refdb`	the online source to use to fetch annotation. It can be `"ensembl"` (default), `"ucsc"`, `"refseq"` or `"ncbi"`. In the later three cases, an SQL connection is opened with the UCSC public databases.
`ver`	the version of the annotation to use.
`tv`	attach or not gene/transcript version to gene/transcript name. Defaults to `FALSE`.
`rc`	Fraction of cores to use. Same as the `rc` in `addAnnotation`.

Details

Regarding org, it can be, for human genomes "hg18", "hg19" or "hg38", for mouse genomes "mm9", "mm10", for rat genomes "rn5" or "rn6", for drosophila genome "dm3" or "dm6", for zebrafish genome "danrer7", "danrer10" or "danrer11", for chimpanzee genome "pantro4", "pantro5", for pig genome "susscr3", "susscr11", for Arabidopsis thaliana genome "tair10" and for Equus caballus genome "equcab2" and "equcab3". Finally, it can be "USER_NAMED_ORG" with a custom organism which has been imported to the annotation database by the user using a GTF/GFF file. For example org="mm10_p1".

Regarding type, it defines the level of transcriptional unit (gene, transcript, 3' UTR, exon) coordinates to be loaded or fetched if not present. The following types are supported:

"gene": canonical gene coordinates are retrieved from the chosen database.
"transcript": all transcript coordinates are retrieved from the chosen database.
"utr": all 3' UTR coordinates are retrieved from the chosen database, grouped per gene.
"transutr": all 3' UTR coordinates are retrieved from the chosen database, grouped per \ transcript.
"transexon": all exon coordinates are retrieved from the chosen database, grouped per transcript.
"exon": all exon coordinates are retrieved from the chosen database.

Value

A data frame with the canonical genes, transcripts, exons or 3' UTRs of the requested organism. When type="genes", the data frame has the following columns: chromosome, start, end, gene_id, gc_content, strand, gene_name, biotype. When type="exon" and type="transexon" the data frame has the following columns: chromosome, start, end, exon_id, gene_id, strand, gene_name, biotype. When type="utr" or type="transutr", the data frame has the following columns: chromosome, start, end, transcript_id, gene_id, strand, gene_name, biotype. The latter applies to when type="transcript". The gene_id and exon_id correspond to type="transcript" Ensembl, UCSC or RefSeq gene, transcript and exon accessions respectively. The gene_name corresponds to HUGO nomenclature gene names.

Note

The data frame that is returned contains only "canonical" chromosomes for each organism. It does not contain haplotypes or non-anchored sequences and does not contain mitochondrial chromosomes.

Author(s)

Panagiotis Moulos

Examples

mm10Genes <- getAnnotation("mm10","gene")

pmoulos/sitadela documentation built on May 19, 2024, 3:52 a.m.

pmoulos/sitadela index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pmoulos/sitadela
An R package for the easy provision of simple but complete tab-delimited genomic annotation from a variety of sources and organisms

getAnnotation: Annotation downloader
In pmoulos/sitadela: An R package for the easy provision of simple but complete tab-delimited genomic annotation from a variety of sources and organisms

Annotation downloader

Description

Usage

Arguments

Details

Value

Note

Author(s)

Examples

Related to getAnnotation in pmoulos/sitadela...

R Package Documentation

Browse R Packages

We want your feedback!

pmoulos/sitadela An R package for the easy provision of simple but complete tab-delimited genomic annotation from a variety of sources and organisms

getAnnotation: Annotation downloader In pmoulos/sitadela: An R package for the easy provision of simple but complete tab-delimited genomic annotation from a variety of sources and organisms

Annotation downloader

Description

Usage

Arguments

Details

Value

Note

Author(s)

Examples

Related to getAnnotation in pmoulos/sitadela...

R Package Documentation

Browse R Packages

We want your feedback!

pmoulos/sitadela
An R package for the easy provision of simple but complete tab-delimited genomic annotation from a variety of sources and organisms

getAnnotation: Annotation downloader
In pmoulos/sitadela: An R package for the easy provision of simple but complete tab-delimited genomic annotation from a variety of sources and organisms