read_fasta: Read Protein Sequences from FASTA Files
In geneviewer: Gene Cluster Visualizations

read_fasta

R Documentation

Read Protein Sequences from FASTA Files

Description

This function reads protein sequences from the specified FASTA file or all FASTA files within a directory. It specifically looks for metadata in the FASTA headers with key-value pairs separated by an equals sign '='. For example, from the header '>protein1 [gene=scnD] [protein=ScnD]', it extracts 'gene' as the key and 'scnD' as its value, and similarly for other key-value pairs.

Usage

read_fasta(fasta_path, sequence = TRUE, keys = NULL, file_extension = "fasta")

Arguments

`fasta_path`	Path to the FASTA file or directory containing FASTA files.
`sequence`	Logical; if 'TRUE', the protein sequences are included in the returned data frame.
`keys`	An optional vector of strings representing specific keys within the fasta header to retain in the final data frame. If 'NULL' (the default), all keys within the specified feature are included.
`file_extension`	Extension of the FASTA files to be read from the directory (default is 'fasta').

Details

The Biostrings package is required to run this function.

Value

A data frame with columns for each piece of information extracted from the FASTA headers.

Examples

## Not run: 
# Read sequences from a single FASTA file
sequences_df <- read_fasta("path/to/single_file.fasta")

# Read all sequences from a directory of FASTA files
sequences_df <- read_fasta("path/to/directory/", file_extension = "fa")

# Read sequences and include the protein sequences in the output
sequences_df <- read_fasta("path/to/directory/", sequence = TRUE)

## End(Not run)

geneviewer documentation built on Nov. 5, 2025, 5:13 p.m.