defineMirnahostgenes: Utilities to define miRNA host genes and build a...

Description Usage Arguments Details Note Author(s) Examples

Description

These functions allow download a specific miRBase release, define miRNA host genes for miRNAs of a species, generate an SQLite database containing that information and ultimately build the corresponding annotation package inR.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
defineMirhostgenes(gff, database=c("core"), host="ensembldb.ensembl.org",
                   user="anonymous", pass, ensemblapi, verbose=FALSE)

downloadMirbase(version, path=".", force.download=FALSE)

fetchAdditionalInformation(mirbase.path=".", path=".", verbose=FALSE)

getArrayFeaturesForTx(species, arrays=c("HG-U133_Plus_2", "PrimeView"),
                      prop.probes=0.8, max.mm=0, min.probe.algn=24,
                      host="ensembldb.ensembl.org", user="anonymous",
                      pass, ensemblapi, verbose=FALSE)

makeHostgeneSQLiteFromTables(path=".")

makeMirhostgenesPackage(db, version, maintainer, author, destDir=".",
                        license="Artistic-2.0")

Arguments

arrays

Character vector specifying the types of microarrays for which probe sets should be searched for. Note that these have to correspond to the names of microarrays as available in the Ensembl databases.

author

The author of the package.

database

A character vector with the database name(s) that should be queried. Allowed are "core", "cdna", "otherfeatures" and "vega". For most cases it suffices to query the Ensembl "core" database, as gene and transcript models are mostly redundant between the databases (with "vega" containing human curated genes and "otherfeatures" NCBI RefSeq genes and other features).

db

For makeMirhostgenesPackage: the file name of the (SQLite) database file created with the function makeHostgeneSQLiteFromTables.

destDir

Where the package should be saved to.

gff

The gff file name containing the genomic alignments for the pre-miRNAs. Such gff files (one per species) are located in the genomes folder of the downloaded miRBase resource (e.g. downloaded by the downloadMirbase function). Note that the specified gff has to be in gff version 3 format, thus a ".gff3" (e.g. "hsa.gff3") file from the genomes forldes shoudl be submitted.

ensemblapi

The path to the Ensembl perl API installed locally on the system. The Ensembl perl API version determines which Ensembl database version is queried.

force.download

Force the download of the miRBase even if the same version is already available locally.

host

The hostname to access the Ensembl database.

license

The license of the package.

maintainer

The maintainer of the package.

max.mm

Maximum number of mismatches of a probe with the target genes.

min.probe.algn

Minimal length of the probe alignment within the exons of a transcript. The default value of 24 means that all nucleotides of a 25nt long probe have to map within the exons of a transcript.

mirbase.path

For fetchAdditionalInformation: the directory to which the miRBase database files have been downloaded (by the downloadMirbase function).

pass

The password for the Ensembl database.

path

makeHostgenesSQLiteFromTables: the directory in which the database table files generated by defineHostgenes The directory to which downloadMirbase will download the miRBase and in which makeHostgeneSQLiteFromTables can find the txt files generated by defineMirhostgenes are located.

prop.probes

Proportion of probes of a probe set that have to map within the exons of a transcript to be considered. The default value of 0.8 means that at least 80 percent of the probes of a probe set have to target the transcript.

species

For getArrayFeaturesForTx: the species the miRNA host genes were defined for.

user

The username for the Ensembl database.

verbose

print progress messages.

version

For downloadMirbase: the version of the miRBase that should be downloaded. If not specified the most recent version is downloaded.

For makeMirhostgenesPackage: the version for the package to be created.

Details

The downloadMirbase and defineMirhostgenes functions internally call the perl scripts get-mirbase.pl and define_mirna_host_genes.pl, respectively. The define_mirna_host_genes.pl needs the Perl API and bioperl to be present in the PERL5LIB environment variable.

The fetchAdditionalInformation function extracts additional informations such as the confidence information, read counts, pre-miRNA sequences and miRNA family definitions from the downloaded miRBase database tables and inserts it into tables for the MirhostDb database. This function should be called after defineMirhostgenes and before makeHostgeneSQLiteFromTables.

The getArrayFeaturesForTx again uses the Ensembl Perl API to fetch, for the defined host transcripts, microarray features (probe sets) possibly detecting the transcripts.

The makeHostgeneSQLiteFromTables function reads all the txt files generated by the defineMirhostgenes function and builds a SQLite database. If additional files "pre_mirna_sequence.txt" and/or "mirna_fam.txt", created by the functions createPremirnaSequenceTable and createMirfamTable are also present, the information contained in these files will be added to the database too.

The makeMirhostgenesPackage finally creates an annotation package based on the SQLite file generated above.

Note

A local installation of the Ensembl perl API is required for the defineMirhostgenes. See http://www.ensembl.org/info/docs/api/api_installation.html for installation inscructions.

Author(s)

Johannes Rainer

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
## Not run: 

library(mirhostgenes)

## Download mirbase version 20 (matching genome release 37)
downloadMirbase(version=20)

## Define miRNA host genes using the Ensembl core database.
## we're using the gff file for human miRNAs of the miRBase version we
## just downloaded.
## we set v=TRUE to get some feedback about the progress.
defineMirhostgenes(gff="20/genomes/hsa.gff3", v=TRUE)

## Fetch additional information from downloaded miRBase files:
## o pre-miRNA sequence data.
## o miRNA family information.
## o pre- and mature miRNA confidence data.
## o pre- and mature miRNA read count data.
fetchAdditionalInformation(mirbase.path="20/")

## Add probe features... for Affymetrix microarrays. It is crucial that
## the species matches!
## We do also specify form which microarrays we want to fetch the probes/
## probe sets.
getArrayFeaturesForTx(species="human", arrays=c("HG-U133_Plus_2", "PrimeView"))

## Build the SQLite database from the generated txt files.
DBNAME <- makeHostgeneSQLiteFromTables()

## Build a R package providing the annotation database.
makeMirhostgenesPackage(DBNAME,
                        version="0.0.1",
                        maintainer="Johannes Rainer <johannes.rainer@eurac.edu>",
                        author="J Rainer"
                        )


## End(Not run)

jotsetung/mirhostgenes documentation built on May 19, 2019, 9:42 p.m.