read_fasta | R Documentation |
This function reads protein sequences from the specified FASTA file or all FASTA files within a directory. It specifically looks for metadata in the FASTA headers with key-value pairs separated by an equals sign '='. For example, from the header '>protein1 [gene=scnD] [protein=ScnD]', it extracts 'gene' as the key and 'scnD' as its value, and similarly for other key-value pairs.
read_fasta(fasta_path, sequence = TRUE, keys = NULL, file_extension = "fasta")
fasta_path |
Path to the FASTA file or directory containing FASTA files. |
sequence |
Logical; if 'TRUE', the protein sequences are included in the returned data frame. |
keys |
An optional vector of strings representing specific keys within the fasta header to retain in the final data frame. If 'NULL' (the default), all keys within the specified feature are included. |
file_extension |
Extension of the FASTA files to be read from the directory (default is 'fasta'). |
The Biostrings package is required to run this function.
A data frame with columns for each piece of information extracted from the FASTA headers.
## Not run:
# Read sequences from a single FASTA file
sequences_df <- read_fasta("path/to/single_file.fasta")
# Read all sequences from a directory of FASTA files
sequences_df <- read_fasta("path/to/directory/", file_extension = "fa")
# Read sequences and include the protein sequences in the output
sequences_df <- read_fasta("path/to/directory/", sequence = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.