gff2fasta: Retrieving annotated sequences

View source: R/gff.R

gff2fastaR Documentation

Retrieving annotated sequences

Description

Retrieving from a genome the sequences specified in a gff.table.

Usage

gff2fasta(gff.table, genome)

Arguments

gff.table

A gff.table (tibble) with genomic features information.

genome

A fasta object (tibble) with the genome sequence(s).

Details

Each row in gff.table (see readGFF) describes a genomic feature in the genome, which is a tibble with columns ‘⁠Header⁠’ and ‘⁠Sequence⁠’. The information in the columns Seqid, Start, End and Strand are used to retrieve the sequences from the ‘⁠Sequence⁠’ column of genome. Every Seqid in the gff.table must match the first token in one of the ‘⁠Header⁠’ texts, in order to retrieve from the correct ‘⁠Sequence⁠’.

Value

A fasta object with one row for each row in gff.table. The Header for each sequence is a summary of the information in the corresponding row of gff.table.

Author(s)

Lars Snipen and Kristian Hovde Liland.

See Also

readGFF, findOrfs.

Examples

# Using two files in this package
gff.file <- file.path(path.package("microseq"),"extdata","small.gff")
genome.file <- file.path(path.package("microseq"),"extdata","small.fna")

# Reading the genome first
genome <- readFasta(genome.file)

# Retrieving sequences
gff.table <- readGFF(gff.file)
fa.tbl <- gff2fasta(gff.table, genome)

# Alternative, using piping
readGFF(gff.file) %>% gff2fasta(genome) -> fa.tbl


microseq documentation built on Aug. 21, 2023, 5:10 p.m.