gff2fasta: Retrieving annotated sequences

View source: R/gff.R

gff2fastaR Documentation

Retrieving annotated sequences


Retrieving from a genome the sequences specified in a gff.table.


gff2fasta(gff.table, genome)



A gff.table (tibble) with genomic features information.


A fasta object (tibble) with the genome sequence(s).


Each row in gff.table (see readGFF) describes a genomic feature in the genome, which is a tibble with columns ‘⁠Header⁠’ and ‘⁠Sequence⁠’. The information in the columns Seqid, Start, End and Strand are used to retrieve the sequences from the ‘⁠Sequence⁠’ column of genome. Every Seqid in the gff.table must match the first token in one of the ‘⁠Header⁠’ texts, in order to retrieve from the correct ‘⁠Sequence⁠’.


A fasta object with one row for each row in gff.table. The Header for each sequence is a summary of the information in the corresponding row of gff.table.


Lars Snipen and Kristian Hovde Liland.

See Also

readGFF, findOrfs.


# Using two files in this package
gff.file <- file.path(path.package("microseq"),"extdata","small.gff")
genome.file <- file.path(path.package("microseq"),"extdata","small.fna")

# Reading the genome first
genome <- readFasta(genome.file)

# Retrieving sequences
gff.table <- readGFF(gff.file)
fa.tbl <- gff2fasta(gff.table, genome)

# Alternative, using piping
readGFF(gff.file) %>% gff2fasta(genome) -> fa.tbl

