readGtf: Reading and parsing GTF files into refGenome objects.

Description Usage Arguments Details Value Author(s) References Examples

Description

Reads and parses content of GTF files. The file content is written into the provided object into the environment located in 'ev' slot (i.e. per reference). The function writes two tables: 'gtf' containing the main file content and 'genes' containing data from 'gene' typed features.

Usage

1
2
read.gtf(object, filename="transcripts.gtf", sep = "\t",
            useBasedir=TRUE, comment.char = "#", progress=100000L, ...)

Arguments

object

refGenome object. Will contain the extracted data.

filename

(Base-)Name of GTF file.

sep

Character: Column separator in GTF file. Standard value is '\t'.

useBasedir

Logical: Shall basedir (from refGenome object) be appended to filename?

comment.char

Character: Lines beginning with this character will be skipped.

progress

Integer: The parsing routine prints a progress Information after reading the given number of lines.

...

Currently unused.

Details

GTF is an extension of the GFF file format. GTF contains tabled data: Nine columns separated by a tab delimiter. The last column expands into a list of attributes, separated by a semicolon an exactly one space. Each attribute consists of a type - value pair which are separated by one empty space. Enclosing quotation marks (") around attribute values are marks are skipped during import.

Value

None. The provided object is filled with the parsed data. Two tables are generated: 'gtf' and 'genes'. The first eight columns of the gtf table are fixed. The content is described in the following table.

id Numeric index for unique site. Integer.
seqid Chromosome identifier. Character.
source Program which generated data.
feature Feature type (e.g. 'exon', 'CDS'). Character.
start Start position of feature (1-based). Integer.
end End position of feature (inclusive). Integer.
score Value between 0 and 1000 ("." for no score). Character.
strand '+', '-' or '.'. Character.
frame 0-2 for coding exons. '.' otherwise. Character.

Author(s)

Wolfgang Kaisers

References

UCSC Genome Bioinformatics: Data File Formats. http://genome.ucsc.edu/FAQ/FAQformat.html#format3

Examples

1
2
3
4
5
6
##-------------------------------------##
## Ensembl
##-------------------------------------##
ef <- system.file("extdata", package="refGenome")
en <- ensemblGenome(ef)
read.gtf(en, "hs.ensembl.76.small.gtf")

Example output

Loading required package: doBy
Loading required package: RSQLite
[read.gtf.refGenome] Reading file 'hs.ensembl.76.small.gtf'.

[GTF]      100 lines processed.
[read.gtf.refGenome] Extracting genes table.
[read.gtf.refGenome] Found 14 gene records.
[read.gtf.refGenome] Finished.

refGenome documentation built on May 23, 2019, 1:03 a.m.