Description Usage Arguments Details Value GFFFile objects Author(s) References Examples
These functions support the import and export of the GFF format, of which there are three versions and several flavors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ## S4 method for signature 'GFFFile,ANY,ANY'
import(con, format, text,
version = c("", "1", "2", "3"),
genome = NA, colnames = NULL, which = NULL,
feature.type = NULL, sequenceRegionsAsSeqinfo = FALSE)
import.gff(con, ...)
import.gff1(con, ...)
import.gff2(con, ...)
import.gff3(con, ...)
## S4 method for signature 'ANY,GFFFile,ANY'
export(object, con, format, ...)
## S4 method for signature 'GenomicRanges,GFFFile,ANY'
export(object, con, format,
version = c("1", "2", "3"),
source = "rtracklayer", append = FALSE, index = FALSE)
## S4 method for signature 'GenomicRangesList,GFFFile,ANY'
export(object, con, format, ...)
export.gff(object, con, ...)
export.gff1(object, con, ...)
export.gff2(object, con, ...)
export.gff3(object, con, ...)
|
con |
A path, URL, connection or |
object |
The object to export, should be a |
format |
If not missing, should be one of “gff”, “gff1” “gff2”, “gff3”, “gvf”, or “gtf”. |
version |
If the format is given as “gff”, i.e., it does
not specify a version, then this should indicate the GFF version as
one of “” (for import only, from the |
text |
If |
genome |
The identifier of a genome, or a |
colnames |
A character vector naming the columns to parse. These
should name either fixed fields, like |
which |
A |
feature.type |
|
sequenceRegionsAsSeqinfo |
If |
source |
The value for the source column in GFF. This is typically the name of the package or algorithm that generated the feature. |
index |
If |
append |
If |
... |
Arguments to pass down to methods to other methods. For
import, the flow eventually reaches the |
The Generic Feature Format (GFF) format is a tab-separated table of intervals. There are three different versions of GFF, and they all have the same number of columns. In GFF1, the last column is a grouping factor, whereas in the later versions the last column holds application-specific attributes, with some conventions defined for those commonly used. This attribute support facilitates specifying extensions to the format. These include GTF (Gene Transfer Format, an extension of GFF2) and GVF (Genome Variation Format, an extension of GFF3). The rtracklayer package recognizes the “gtf” and “gvf” extensions and parses the extra attributes into columns of the result; however, it does not perform any extension-specific processing. Both GFF1 and GFF2 have been proclaimed obsolete; however, the UCSC Genome Browser only supports GFF1 (and GTF), and GFF2 is still in broad use.
GFF is distinguished from the simpler BED format by its flexible
attribute support and its hierarchical structure, as specified by the
group
column in GFF1 (only one level of grouping) and the
Parent
attribute in GFF3. GFF2 does not specify a convention
for representing hierarchies, although its GTF extension provides this
for gene structures. The combination of support for hierarchical data
and arbitrary descriptive attributes makes GFF(3) the preferred format
for representing gene models.
Although GFF features a score
column, large quantitative data
belong in a format like BigWig and alignments from
high-throughput experiments belong in
BAM. For variants, the VCF format (supported
by the VariantAnnotation package) seems to be more widely adopted than
the GVF extension.
A note on the UCSC track line metaformat: track lines are a means for
passing hints to visualization tools like the UCSC Genome Browser and
the Integrated Genome Browser (IGB), and they allow multiple tracks to
be concatenated in the same file. Since GFF is not a UCSC format, it
is not common to annotate GFF data with track lines, but rtracklayer
still supports it. To export or import GFF data in the track line
format, call export.ucsc
or import.ucsc
.
The following is the mapping of GFF elements to a GRanges
object.
NA values are allowed only where indicated.
These appear as a “.” in the file. GFF requires that all columns
are included, so export
generates defaults for missing columns.
the ranges
component.
character vector in the source
column; defaults to “rtracklayer” on export.
character vector in the type
column; defaults
to “sequence_feature” in the output, i.e., SO:0000110.
numeric vector (NA's allowed) in the score
column, accessible via the score
accessor; defaults
to NA
upon export.
strand factor (NA's allowed) in the strand
column, accessible via the strand
accessor; defaults
to NA
upon export.
integer vector, either 0, 1 or 2 (NA's allowed);
defaults to NA
upon export.
a factor (GFF1 only); defaults to the seqid
(e.g., chromosome) on export.
In GFF versions 2 and 3, attributes map to arbitrary columns in the
result. In GFF3, some attributes (Parent
, Alias
,
Note
, DBxref
and Ontology_term
) can have
multiple, comma-separated values; these columns are thus always
CharacterList
objects.
A GRanges
with the metadata columns described in the details.
The GFFFile
class extends RTLFile
and is a
formal represention of a resource in the GFF format.
To cast a path, URL or connection to a GFFFile
, pass it to
the GFFFile
constructor. The GFF1File
, GFF2File
,
GFF3File
, GVFFile
and GTFFile
classes all extend
GFFFile
and indicate a particular version of the format.
It has the following utility methods:
genome
: Gets the genome identifier from
the “genome-build” header directive.
Michael Lawrence
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | test_path <- system.file("tests", package = "rtracklayer")
test_gff3 <- file.path(test_path, "genes.gff3")
## basic import
test <- import(test_gff3)
test
## import.gff functions
import.gff(test_gff3)
import.gff3(test_gff3)
## GFFFile derivatives
test_gff_file <- GFF3File(test_gff3)
import(test_gff_file)
test_gff_file <- GFFFile(test_gff3)
import(test_gff_file)
test_gff_file <- GFFFile(test_gff3, version = "3")
import(test_gff_file)
## from connection
test_gff_con <- file(test_gff3)
test <- import(test_gff_con, format = "gff")
## various arguments
import(test_gff3, genome = "hg19")
import(test_gff3, colnames = character())
import(test_gff3, colnames = c("type", "geneName"))
## 'which'
which <- GRanges("chr10:90000-93000")
import(test_gff3, which = which)
## Not run:
## 'append'
test_gff3_out <- file.path(tempdir(), "genes.gff3")
export(test[seqnames(test) == "chr10"], test_gff3_out)
export(test[seqnames(test) == "chr12"], test_gff3_out, append = TRUE)
import(test_gff3_out)
## 'index'
export(test, test_gff3_out, index = TRUE)
test_bed_gz <- paste(test_gff3_out, ".gz", sep = "")
import(test_bed_gz, which = which)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.