buildCustomAnnotation: Import custom annotation to the metaseqR2 annotation database...

View source: R/annotation.R

buildCustomAnnotationR Documentation

Import custom annotation to the metaseqR2 annotation database from GTF file

Description

This function imports a GTF file with some custom annotation to the metaseqR2 annotation database.

Usage

    buildCustomAnnotation(gtfFile, metadata,
    db = file.path(system.file(package = "metaseqR2"),
        "annotation.sqlite"), rewrite=TRUE)

Arguments

gtfFile

a GTF file containing the gene structure of the organism to be imported.

metadata

a list with additional information about the annotation to be imported. See Details.

db

a valid path (accessible at least by the current user) where the annotation database will be set up. It defaults to system.file(package = "metaseqR2"), "annotation.sqlite") that is, the installation path of metaseqR2 package. See also Details.

rewrite

if custom annotation found, rwrite? (default FALSE). Set to TRUE if you wish to update the annotation database for a particular custom annotation.

Details

Regarding the metadata argument, it is a list with specific format which instructs buildCustomAnnotation on importing the custom annotation. Such a list may has the following members:

  • organism a name of the organism which is imported (e.g. "my_mm9"). This is the only mandatory member.

  • source a name of the source for this custom annotation (e.g. "my_mouse_db"). If not given or NULL, the word "inhouse" is used.

  • version a string denoting the version. If not given or NULL, current date is used.

  • chromInfo it can be one of the following:

    • a tab-delimited file with two columns, the first being the chromosome/sequence names and the second being the chromosome/sequence lengths.

    • a BAM file to read the header from and obtain the required information

    • a data.frame with one column with chromosome lengths and chromosome names as rownames.

See the examples below for a metadata example.

Regarding db, this controls the location of the installation database. If the default is used, then there is no need to provide the local database path to any function that uses the database (e.g. the main metaseqr2). Otherwise, the user will either have to provide this each time, or the annotation will have to be downloaded and used on-the-fly.

Value

The function does not return anything. Only the SQLite database is created or updated.

Author(s)

Panagiotis Moulos

Examples

# Dummy database as example
customDir <- file.path(tempdir(),"test_custom")
dir.create(customDir)

myDb <- file.path(customDir,"testann.sqlite")
chromInfo <- data.frame(length=c(1000L,2000L,1500L),
    row.names=c("A","B","C"))

# Build with the metadata list filled (you can also provide a version)
if (.Platform$OS.type == "unix") {
    buildCustomAnnotation(
        gtfFile=file.path(system.file(package="metaseqR2"),
            "dummy.gtf"),
        metadata=list(
            organism="dummy",
            source="dummy_db",
            version=1,
            chromInfo=chromInfo
        ),
        db=myDb
    )

    # Try to retrieve some data
    myGenes <- loadAnnotation(genome="dummy",refdb="dummy_db",
        level="gene",type="gene",db=myDb)
    myGenes
}

## Real data!
## Setup a temporary directory to download files etc.
#customDir <- file.path(tempdir(),"test_custom")
#dir.create(customDir)

#myDb <- file.path(customDir,"testann.sqlite")

## Gene annotation dump from Ensembl
#download.file(paste0("ftp://ftp.ensembl.org/pub/release-98/gtf/",
#  "dasypus_novemcinctus/Dasypus_novemcinctus.Dasnov3.0.98.gtf.gz"),
#  file.path(customDir,"Dasypus_novemcinctus.Dasnov3.0.98.gtf.gz"))

## Chromosome information will be provided from the following BAM file
## available from Ensembl
#bamForInfo <- paste0("ftp://ftp.ensembl.org/pub/release-98/bamcov/",
#  "dasypus_novemcinctus/genebuild/Dasnov3.broad.Ascending_Colon_5.1.bam")

## Build with the metadata list filled (you can also provide a version)
#buildCustomAnnotation(
#  gtfFile=file.path(customDir,"Dasypus_novemcinctus.Dasnov3.0.98.gtf.gz"),
#  metadata=list(
#    organism="dasNov3_test",
#    source="ensembl_test",
#    chromInfo=bamForInfo
#  ),
#  db=myDb
#)

## Try to retrieve some data
#dasGenes <- loadAnnotation(genome="dasNov3_test",refdb="ensembl_test",
#  level="gene",type="gene",db=myDb)
#dasGenes

pmoulos/metaseqR2 documentation built on May 20, 2024, 5:48 a.m.