orf_locate: Parse ORF Coordinates from Prodigal FASTA Headers

View source: R/orf_locate.R

orf_locateR Documentation

Parse ORF Coordinates from Prodigal FASTA Headers

Description

Extracts ORF identifiers, start/end positions and strand orientation directly from the FASTA headers produced by Prodigal. The resulting table is ready for downstream gene-cluster analyses.

Usage

orf_locate(in_seq_data = seq_data)

Arguments

in_seq_data

A data frame with two columns:

SeqName

ORF identifier (Prodigal format: ⁠>ORF_id # start # end # strand # ...⁠).

Sequence

ORF sequence.

Example: "Kuafubacteriaceae--GCA_016703535.1---JADJBV010000001.1_1 # 74 # 1018 # 1 # ..." Can be imported from Prodigal FASTA using:

seq_data <- Biostrings::readBStringSet("Prodigal.fasta",format="fasta", nrec=-1L, skip=0L, seek.first.rec=FALSE, use.names=TRUE) %>%
  data.frame(Sequence = .) %>%
  tibble::rownames_to_column("SeqName")

Value

A data frame


gclink documentation built on Sept. 9, 2025, 5:39 p.m.