seq_gtf: Get transcript sequences from GTF file and sequence info

Description Usage Arguments Value References Examples

View source: R/seq_gtf.R

Description

Given a GTF file (for transcript structure) and DNA sequences, return a DNAStringSet of transcript sequences

Usage

1
2
seq_gtf(gtf, seqs, feature = "transcript", exononly = TRUE,
  idfield = "transcript_id", attrsep = "; ")

Arguments

gtf

one of path to GTF file, or data frame representing a canonical GTF file.

seqs

one of path to folder containing one FASTA file (.fa extension) for each chromosome in gtf, or named DNAStringSet containing one DNAString per chromosome in gtf, representing its sequence. In the latter case, names(seqs) should contain the same entries as the seqnames (first) column of gtf.

feature

one of 'transcript' or 'exon' (default transcript), depending on desired return.

exononly

if TRUE (as it is by default), only create transcript sequences from the features labeled exon in gtf.

idfield

in the attributes column of gtf, what is the name of the field identifying transcripts? Should be character. Default "transcript_id".

attrsep

in the attributes column of gtf, how are attributes separated? Default "; ".

Value

If feature is 'transcript', DNAStringSet containing transcript sequences, with names corresponding to idfield in gtf. If feature is 'exon', DNAStringSet containing exon sequences from gtf, named by exon location (chr, start, end, strand).

References

http://www.ensembl.org/info/website/upload/gff.html

Examples

1
2
3
4
5
6
7
8
 ## Not run: 
  library(Biostrings)
  system('wget https://www.dropbox.com/s/04i6msi9vu2snif/chr22seq.rda')
  load('chr22seq.rda')
  data(gtf_dataframe)
  chr22_processed = seq_gtf(gtf_dataframe, chr22seq)

## End(Not run)

polyester documentation built on Nov. 8, 2020, 8:09 p.m.

Related to seq_gtf in polyester...