refGenome-package: Managing annotation data for reference Genomes from UCSC and...

Description Details Author(s) Examples

Description

The package contains classes for managing (GTF-) annotation data for UCSC and Ensembl genomes. Data can be imported, merged, viewed, searched and saved (as .RData and as SQLite database). There is also a C-routine for detection of overlapping (alignment) ranges with annotated regions.

Details

Package: refGenome
Type: Package
Version: 1.0
Date: 2012-10-06
License: What license is it under?
Depends: methods

Author(s)

Wolfgang Kaisers Maintainer: Wolfgang Kaisers <kaisers@med.uni-duesseldorf.de>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
ens<-ensemblGenome()
basedir(ens) <- system.file("extdata",package="refGenome")
ens_gtf <- "hs.ensembl.62.small.gtf"
read.gtf(ens,ens_gtf)
ddx <- extractByGeneName(ens,"DDX11L1")
ddx
fam <- extractTranscript(ens,"ENST00000417324")
fam
enpa <- extractSeqids(ens,ensPrimAssembly())
enpa
tableTranscript.id(ens)
tableTranscript.name(ens)

Example output

Loading required package: doBy
Loading required package: RSQLite
[read.gtf.refGenome] Reading file 'hs.ensembl.62.small.gtf'.

[GTF]      135 lines processed.
[read.gtf.refGenome] Extracting genes table.
[read.gtf.refGenome] Finished.
Object of class 'ensemblGenome' with 9 rows and 15 columns.
  gene_name id seqid     source feature start   end score strand frame
5   DDX11L1 25     1 pseudogene    exon 11869 12227     .      +     .
8   DDX11L1 28     1 pseudogene    exon 12010 12057     .      +     .
9   DDX11L1 29     1 pseudogene    exon 12179 12227     .      +     .
1   DDX11L1 30     1 pseudogene    exon 12613 12697     .      +     .
6   DDX11L1 26     1 pseudogene    exon 12613 12721     .      +     .
2   DDX11L1 31     1 pseudogene    exon 12975 13052     .      +     .
  protein_id exon_number   transcript_id transcript_name         gene_id
5       <NA>           1 ENST00000456328     DDX11L1-002 ENSG00000223972
8       <NA>           1 ENST00000450305     DDX11L1-001 ENSG00000223972
9       <NA>           2 ENST00000450305     DDX11L1-001 ENSG00000223972
1       <NA>           3 ENST00000450305     DDX11L1-001 ENSG00000223972
6       <NA>           2 ENST00000456328     DDX11L1-002 ENSG00000223972
2       <NA>           4 ENST00000450305     DDX11L1-001 ENSG00000223972
Object of class 'ensemblGenome' with 8 rows and 15 columns.
    transcript_id  id seqid         source     feature start   end score strand
1 ENST00000417324 128     1 protein_coding start_codon 35734 35736     .      -
2 ENST00000417324 129     1 protein_coding        exon 35277 35481     .      -
3 ENST00000417324 126     1 protein_coding        exon 35721 36081     .      -
4 ENST00000417324 127     1 protein_coding         CDS 35721 35736     .      -
5 ENST00000417324 132     1 protein_coding         CDS 35141 35174     .      -
6 ENST00000417324 133     1 protein_coding  stop_codon 35138 35140     .      -
  frame      protein_id gene_name exon_number transcript_name         gene_id
1     0            <NA>   FAM138A           1     FAM138A-001 ENSG00000237613
2     .            <NA>   FAM138A           2     FAM138A-001 ENSG00000237613
3     .            <NA>   FAM138A           1     FAM138A-001 ENSG00000237613
4     0 ENSP00000409362   FAM138A           1     FAM138A-001 ENSG00000237613
5     1 ENSP00000409362   FAM138A           3     FAM138A-001 ENSG00000237613
6     0            <NA>   FAM138A           3     FAM138A-001 ENSG00000237613
Object of class 'ensemblGenome' with 111 rows and 15 columns.
   id seqid     source feature start   end score strand frame protein_id
25 25     1 pseudogene    exon 11869 12227     .      +     .       <NA>
26 26     1 pseudogene    exon 12613 12721     .      +     .       <NA>
27 27     1 pseudogene    exon 13221 14409     .      +     .       <NA>
28 28     1 pseudogene    exon 12010 12057     .      +     .       <NA>
29 29     1 pseudogene    exon 12179 12227     .      +     .       <NA>
30 30     1 pseudogene    exon 12613 12697     .      +     .       <NA>
   gene_name exon_number   transcript_id transcript_name         gene_id
25   DDX11L1           1 ENST00000456328     DDX11L1-002 ENSG00000223972
26   DDX11L1           2 ENST00000456328     DDX11L1-002 ENSG00000223972
27   DDX11L1           3 ENST00000456328     DDX11L1-002 ENSG00000223972
28   DDX11L1           1 ENST00000450305     DDX11L1-001 ENSG00000223972
29   DDX11L1           2 ENST00000450305     DDX11L1-001 ENSG00000223972
30   DDX11L1           3 ENST00000450305     DDX11L1-001 ENSG00000223972

ENST00000327822 ENST00000408384 ENST00000417324 ENST00000423562 ENST00000430492 
             24               1               8              10               9 
ENST00000438504 ENST00000450305 ENST00000456328 ENST00000461467 ENST00000469289 
             12               6               3               2               2 
ENST00000473358 ENST00000488147 ENST00000515242 ENST00000518655 ENST00000537342 
              3              11               7               8               7 
ENST00000538476 ENST00000541675 
             13               9 

AL627309.2-201 BX072566.1-201    DDX11L1-001    DDX11L1-002   DDX11L11-201 
             7             24              6              3              8 
   FAM138A-001    FAM138A-002 MIR1302-10-001 MIR1302-10-002 MIR1302-10-201 
             8              2              3              2              1 
    WASH7P-001     WASH7P-201     WASH7P-202     WASH7P-203     WASH7P-204 
            11             10              9             12              7 
    WASH7P-205     WASH7P-206 
            13              9 

refGenome documentation built on May 23, 2019, 1:03 a.m.