addAnnotation | R Documentation |
This function is the main annotation database creator of sitadela. It creates a local SQLite database for various organisms and categories of genomic regions. Annotations are retrieved in simple, tab-delimited or GRanges formats.
addAnnotation(organisms, sources, db = getDbPath(),
versioned = FALSE, forceDownload = TRUE, retries = 5,
rc = NULL, stopIfNotBS = FALSE)
organisms |
a list of organisms and versions for which to download and build annotations. See also Details. |
sources |
a character vector of public sources
from which to download and build annotations. It can be
one or more of |
db |
a valid path (accessible at least by the
current user) where the annotation database will be
set up. It defaults to
|
versioned |
create an annotation database with versioned genes and transcripts, when possible. |
forceDownload |
by default,
|
retries |
how many times should the annotation worker try to re-connect to internet resources in case of a connection problem or failure. |
rc |
fraction (0-1) of cores to use in a multicore
system. It defaults to |
stopIfNotBS |
stop or warn (default) if certain
|
Regarding the organisms
argument, it is a list
with specific format which instructs
addAnnotation
on which organisms and
versions to download from the respective sources. Such
a list may have the format:
organisms=list(hg19=75, mm9=67, mm10=96:97)
This is explained as follows:
A database comprising the human genome versions
hg19
and the mouse genome versions
mm9, mm10
will be constructed.
If "ensembl"
is in sources
,
version 75 is downloaded for hg19
and versions
67, 96, 97
for mm9, mm10
.
If "ucsc"
or "refseq"
are in
sources
, the latest versions are downloaded
and marked by the download date. As UCSC and RefSeq
versions are not accessible in the same way as
Ensembl, this procedure cannot always be replicated.
organisms
can also be a character vector with organism
names/versions (e.g. organisms = c("mm10","hg19")
),
then the latest versions are downloaded in the case of
Ensembl.
The supported supported organsisms are, for human genomes
"hg18"
, "hg19"
or "hg38"
, for mouse
genomes "mm9"
, "mm10"
, for rat genomes
"rn5"
or "rn6"
, for drosophila genome
"dm3"
or "dm6"
, for zebrafish genome
"danrer7"
, "danrer10"
or "danrer11"
,
for chimpanzee genome "pantro4"
, "pantro5"
,
for pig genome "susscr3"
, "susscr11"
, for
Arabidopsis thaliana genome "tair10"
and for
Equus caballus genome "equcab2"
and "equcab3"
.
Finally, it can be "USER_NAMED_ORG"
with a custom
organism which has been imported to the annotation database
by the user using a GTF/GFF file. For example
org="mm10_p1"
.
Regarding sources
, "ucsc"
corresponds to
UCSC Genome Browser annotated transcripts, "refseq"
corresponds to UCSC RefSeq maintained transcripts while
"ncbi"
corresponds to NCBI RefSeq annotated and
maintained transcripts. UCSC, RefSeq and NCBI annotations
are constructed by querying the UCSC Genome Browser
database.
Regarding stopIfNotBS
, when sources
includes "ucsc"
, "refseq"
or "ncbi"
,
the GC content of a gene is not available as a database
attribute as with Ensembl and has to be calculated if to
be included in the respective annotation. For this reason,
sitadela uses 'BSgenome' packages. If
stopIfNotBS=FALSE
(default), then the annotation
building continues and GC content is NA
for the
missing 'BSgenome' packages.If stopIfNotBS=FALSE
,
then building stops until all the required packages for
the selected organisms become available (installed by
the user).
The function does not return anything. Only the SQLite database is created or updated.
Panagiotis Moulos
# Build a test database with one genome
myDb <- file.path(tempdir(),"testann.sqlite")
organisms <- list(mm10=100)
sources <- "ensembl"
# If the example is not running in a multicore system, rc is ignored
#addAnnotation(organisms,sources,db=myDb,rc=0.5)
# A more complete case, don't run as example
# Since we are using Ensembl, we can also ask for a version
#organisms <- list(
# mm9=67,
# mm10=96:97,
# hg19=75,
# hg38=96:97
#)
#sources <- c("ensembl", "refseq")
## Build on the default location (depending on package location, it may
## require root/sudo)
#addAnnotation(organisms,sources)
## Build on an alternative location
#myDb <- file.path(path.expand("~"),"my_ann.sqlite")
#addAnnotation(organisms,sources,db=myDb)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.