extract_imgt_genes: Extract all gene names from a folder of FASTAs

View source: R/build_lookup.R

extract_imgt_genesR Documentation

Extract all gene names from a folder of FASTAs

Description

extract_imgt_genes() first runs parse_imgt_fasta() on all FASTA files in a given folder to pull out the gene names. Then it returns those names in an alphabetically sorted dataframe.

Usage

extract_imgt_genes(data_dir)

Arguments

data_dir

A string, the path to directory containing FASTA files.

Value

A dataframe of gene names.

Examples

# Given a folder with FASTA files containing these headers:
#   >SomeText|TRAC*01|MoreText|
#   >SomeText|TRAV1-1*01|MoreText|
#   >SomeText|TRAV1-1*02|MoreText|
#   >SomeText|TRAV1-2*01|MoreText|
#   >SomeText|TRAV14/DV4*01|MoreText|
#   >SomeText|TRAV38-1*01|MoreText|
#   >SomeText|TRAV38-2/DV8*01|MoreText|
#   >SomeText|TRBV29-1*01|MoreText|
#   >SomeText|TRBV29-1*02|MoreText|
#   >SomeText|TRBV29/OR9-2*01|MoreText|

fastadir <- get_example_path("fasta_dir/")
extract_imgt_genes(fastadir)

TCRconvertR documentation built on June 8, 2025, 10:43 a.m.