Read and write FASTA files


Reads and writes biological sequences (DNA, RNA, protein) in the FASTA format.


writeFasta(fdta, out.file, width = 0)



url/directory/name of (gzipped) FASTA file to read.


A tibble with sequence data, see ‘Details’ below.


Name of (gzipped) FASTA file to create.


Number of characters per line, or 0 for no linebreaks.


These functions handle input/output of sequences in the commonly used FASTA format. For every sequence it is presumed there is one Header-line starting with a ‘>’. If filenames (in.file or out.file) have the extension .gz they will automatically be compressed/uncompressed.

The sequences are stored in a tibble, opening up all the possibilities in R for fast and easy manipulations. The content of the file is stored as two columns, ‘⁠Header⁠’ and ‘⁠Sequence⁠’. If other columns are added, these will be ignored by writeFasta.

The default width = 0 in writeFasta results in no linebreaks in the sequences (one sequence per line).


readFasta returns a tibble with the contents of the (gzipped) FASTA file stored in two columns of text. The first, named ‘⁠Header⁠’, contains the headerlines and the second, named ‘⁠Sequence⁠’, contains the sequences.

writeFasta produces a (gzipped) FASTA file.


Lars Snipen and Kristian Hovde Liland.

## Not run: 
# We need a FASTA-file to read, here is one example file:
fa.file <- file.path(file.path(path.package("microseq"),"extdata"),"small.ffn")

# Read and write
fdta <- readFasta(fa.file)
ok <- writeFasta(fdta[4:5,], out.file = "delete_me.fasta")

# Make use of dplyr to copy parts of the file to another file
readFasta(fa.file) %>% 
  filter(str_detect(Sequence, "TGA$")) %>% 
  writeFasta(out.file = "TGAstop.fasta", width = 80) -> ok

## End(Not run)

