flattenGTF: Flatten Features in GTF or GFF Annotation Files

Description Usage Arguments Details Value Author(s) See Also

View source: R/flattenGTF.R

Description

Convert a GTF/GFF annotation to SAF format and flatten overlapping features.

Usage

1
2
3
4
5
6
7
8
9
flattenGTF(

    # basic input/output options
    GTFfile,
    GTF.featureType = "exon",
    GTF.attrType = "gene_id",

    # the option specifying the merging algorithm
    method = "merge")

Arguments

GTFfile

a character string giving the name of a GTF file as input.

GTF.featureType

a character string giving the feature type used to select rows in a GTF annotation. "exon" by default. Feature types can be found in the third column of a GTF annotation.

GTF.attrType

a character string giving the attribute type in a GTF annotation which will be used to group features. "gene_id" by default. Attributes can be found in the ninth column of a GTF annotation.

method

a character string specifying the method for how to flatten the GTF file. The method can be either "merge" or "chop". "merge" by default. See the details section for more information.

Details

This function locates features in a GTF annotation via GTF.featureType and then groups them into meta-features via GTF.attrType.

When method="merge", the overlapping features found in a meta-feature will be merged to form a single continuous feature encompassing all the overlapping features. If method="chop", overlapping features will be chopped into multiple non-overlapping bins. Here is an example to illustrate the differences between the two methods. Say there are three exons belonging to the same gene and the coordinates of these exons are [100, 200], [150, 500] and [250, 400]. When running on "merge" mode, a single exon will be returned for this gene and the coordinates of this exon are [100, 500], representing the union of the three original exons. When running on "chop" mode, five non-overlapping bins will be returned, including [100, 149], [150, 200], [201, 249], [250, 400] and [401, 500]. Intervals of these bins are determined by start and end coordinates of the three original exons (100, 150, 200, 250, 400 and 500).

Output of this function is a SAF format annotation which can be fed to featureCounts function for read counting. Description to SAF format annotation can also be found in featureCounts.

Value

A data.frame including a SAF format annotation in which features included in each meta-feature are all distinct.

Author(s)

Yang Liao and Wei Shi

See Also

featureCounts


Rsubread documentation built on March 17, 2021, 6:01 p.m.