seq_gtf: Get transcript sequences from GTF file and sequence info

Description Usage Arguments Value References Examples

Description

Given a GTF file (for transcript structure) and DNA sequences, return a DNAStringSet of transcript sequences

Usage

1
2
seq_gtf(gtf, seqs, exononly = TRUE, idfield = "transcript_id",
  attrsep = "; ")

Arguments

gtf

one of path to GTF file, or data frame representing a canonical GTF file.

seqs

one of path to folder containing one FASTA file (.fa extension) for each chromosome in gtf, or named DNAStringSet containing one DNAString per chromosome in gtf, representing its sequence. In the latter case, names(seqs) should contain the same entries as the seqnames (first) column of gtf.

exononly

if TRUE (as it is by default), only create transcript sequences from the features labeled exon in gtf.

idfield

in the attributes column of gtf, what is the name of the field identifying transcripts? Should be character. Default "transcript_id".

attrsep

in the attributes column of gtf, how are attributes separated? Default "; ".

Value

DNAStringSet containing transcript sequences, with names corresponding to idfield in gtf

References

http://www.ensembl.org/info/website/upload/gff.html

Examples

1
2
3
4
library(Biostrings)
  load(url('http://biostat.jhsph.edu/~afrazee/chr22seq.rda'))
  data(gtf_dataframe)
  chr22_processed = seq_gtf(gtf_dataframe, chr22seq)

alyssafrazee/polyester-release documentation built on May 12, 2019, 2:32 a.m.