getAnnotation | R Documentation |
For Ensembl based annotations, this function connects to the
EBI's Biomart service using the package biomaRt and downloads
annotation elements (gene co-ordinates, exon co-ordinates,
gene identifications, biotypes etc.) for each of the supported
organisms. For UCSC/RefSeq annotations, it connects to the
respective UCSC SQL databases if the package RMySQL
is
present, otherwise it downloads flat files and build a
temporary SQLite database to make the necessary build
queries. Gene and transcript versions can be attached (when
available) using the tv
argument. This is very useful
when transcript versioning is required, such as several
precision medicine applications.
getAnnotation(org, type, refdb = "ensembl", ver = NULL,
tv = FALSE, rc = NULL)
org |
the organism for which to download annotation (one of the supported ones, see Details). |
type |
the transcriptional unit annotation level
to load. It can be one of |
refdb |
the online source to use to fetch
annotation. It can be |
ver |
the version of the annotation to use. |
tv |
attach or not gene/transcript version to
gene/transcript name. Defaults to |
rc |
Fraction of cores to use. Same as the
|
Regarding org
, it can be, for human genomes
"hg18"
, "hg19"
or "hg38"
, for mouse
genomes "mm9"
, "mm10"
, for rat genomes
"rn5"
or "rn6"
, for drosophila genome
"dm3"
or "dm6"
, for zebrafish genome
"danrer7"
, "danrer10"
or "danrer11"
,
for chimpanzee genome "pantro4"
, "pantro5"
,
for pig genome "susscr3"
, "susscr11"
, for
Arabidopsis thaliana genome "tair10"
and for
Equus caballus genome "equcab2"
and "equcab3"
.
Finally, it can be "USER_NAMED_ORG"
with a custom
organism which has been imported to the annotation database
by the user using a GTF/GFF file. For example
org="mm10_p1"
.
Regarding type
, it defines the level of
transcriptional unit (gene, transcript, 3' UTR, exon)
coordinates to be loaded or fetched if not present. The
following types are supported:
"gene"
: canonical gene coordinates are
retrieved from the chosen database.
"transcript"
: all transcript
coordinates are retrieved from the chosen database.
"utr"
: all 3' UTR coordinates are
retrieved from the chosen database, grouped per
gene.
"transutr"
: all 3' UTR coordinates are
retrieved from the chosen database, grouped per \
transcript.
"transexon"
: all exon coordinates are
retrieved from the chosen database, grouped per
transcript.
"exon"
: all exon coordinates are
retrieved from the chosen database.
A data frame with the canonical genes, transcripts,
exons or 3' UTRs of the requested organism. When
type="genes"
, the data frame has the following
columns: chromosome, start, end, gene_id, gc_content,
strand, gene_name, biotype. When type="exon"
and
type="transexon"
the data frame has the following
columns: chromosome, start, end, exon_id, gene_id, strand,
gene_name, biotype. When type="utr"
or
type="transutr"
, the data frame has the following
columns: chromosome, start, end, transcript_id, gene_id,
strand, gene_name, biotype. The latter applies to when
type="transcript"
. The gene_id and exon_id
correspond to type="transcript"
Ensembl, UCSC
or RefSeq gene, transcript and exon accessions
respectively. The gene_name corresponds to HUGO
nomenclature gene names.
The data frame that is returned contains only "canonical" chromosomes for each organism. It does not contain haplotypes or non-anchored sequences and does not contain mitochondrial chromosomes.
Panagiotis Moulos
mm10Genes <- getAnnotation("mm10","gene")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.