extract: Extract features from gtf/gff objects

Description Usage Arguments Details Value See Also Examples

Description

Provides functions for further post processing on objects of class gtf and gff.

Usage

1
2
3
4
extract(x, feature = c("gene_exon", "gene", "gene_intron", "exon", "intron"),
  type = c("default", "union", "disjoin", "intersect", "longest", "shortest",
  "overlap"), ignore_strand = FALSE, transcript_id = "transcript_id",
  gene_id = "gene_id", ...)

Arguments

x

Input object of class gtf or gff which inherits from GRanges.

feature

A character vector of (usually related) features to extract from. One of "gene_exon", "gene", "gene_intron", "exon", "intron". NB: "exon" feature must be present in x.

type

default just extracts the features and returns it as such.

union merges all overlapping intervals into one. For e.g., with intervals [a,b], [c,d], [e,f] where c < a < e < d < b < f, the union is [c, f]. NB: There may be more than one row per feature.

intersect returns only the intersecting part. Using the same intervals as before, the intersection is [e,d]. NB: If there is an intersection, exactly one row is returned, else the feature is skipped entirely (0-rows).

disjoin splits intervals into non-overlapping pieces. Using the same interval as before, the pieces would be [c,a-1] and [b+1,f]. NB: it could result in multiple rows for each a given feature.

longest retains only the longest interval.

shortest retains only the shortest interval.

overlap is a special case. Of the overlapping intervals, only the shortest interval is retained iff they all have identical start, end, or both. If not, all overlapping intervals are retained. For e.g., with intervals [a,b], [c,d], [e,f] where a == c, b == f, d > b,f and e > a,c, the interval [e,f] will be retained.

ignore_strand

Logical argument to pass to GRanges function. Indicates whether strand should be ignored when constructing GRanges object or not. Default is FALSE.

transcript_id

Column name in x corresponding to transcript id. Default value is "transcript_id".

gene_id

Column name in x corresponding to gene id. Default value is "gene_id".

...

Arguments passed to other functions. Ignored at the moment.

Details

Extract features based on various criteria (usually intended for obtaining read counts using gcount for a given bam file.

Value

An object of class "gene" when feature is "gene", "gene_exon" or "gene_intron", and of class "exon" and "intron" when feature is "exon" or "intron" respectively. They all inherit from GRanges.

See Also

read_format as_granges extract construct_introns

Examples

1
2
3
4
5
6
7
8
9
path <- system.file("tests", package="gread")
gtf_file <- file.path(path, "sample.gtf")
gtf <- read_format(gtf_file)
# extract exons, combine coordinates of overlapping exons
exons <- extract(gtf, feature="exon", type="union")
# extract all exons within the gene, but combine overlapping exons
exons <- extract(gtf, feature="gene_exon", type="union")
## extract gene span (uses exon coordinates if feature='gene' doesn't exist)
genes <- extract(gtf, feature="gene", type="default")

asrinivasan-oa/gread documentation built on May 10, 2019, 2:04 p.m.