gtf2gr: GTF file parsing

Description Usage Arguments Details Value Author(s) Examples

View source: R/TransView_tools.R

Description

Conversion of a gtf file from UCSC or ENSEMBL to a GRanges object maintaining the exon structure per transcript.

Usage

1
gtf2gr(gtf_file, chromosomes=NA, refseq_nm=F, gtf_feature=c("exon"),transcript_id="transcript_id",gene_id="gene_id")

Arguments

gtf_file

Character string with the filename of the gtf file. Fileformats from USCS and ENSEMBL are supported and gzip compression is supported.

chromosomes

A character vector with the chromosomes. Restricts the output to the case insensitive matching chromosomes.

refseq_nm

An option for GTF files based on RefSeq annotation. If TRUE only identifiers beginning with NM_ will be used.

gtf_feature

Defines the GTF feature types to be returned.

transcript_id

Defines name of the attribute within the attribute list which should be used as transcript IDs.

gene_id

Defines name of the attribute within the attribute list which should be used as gene IDs.

Details

This function parses GTF files generated by the UCSC table browser or downloaded from the ENSEMBL ftp server. It uses only rows with a 'exon' tag in the feature column (3rd column). The transcript name will be generated from the 'transcript' entry in the attribute column (9th column). The exons of each transcript are numbered using the make.unique function on the transcript name and used as row names.

Value

GenomicRanges object with one row per exon. rownames are transcript IDs and an exon_id is provided.

Author(s)

Julius Muller ju-mu@alumni.ethz.ch

Examples

1
2
3
4
5
exgtf<-dir(system.file("extdata", package="TransView"),full=TRUE,patt="gtf.gz$")

GTF.mm9<-gtf2gr(exgtf[2])

head(GTF.mm9)

TransView documentation built on Nov. 8, 2020, 5:31 p.m.