fasta_header_to_id: extract the protein identifier from a fasta header
In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline

fasta_header_to_id

R Documentation

extract the protein identifier from a fasta header

Description

the first set of non-whitespace characters on the row is assumed to be the protein ID. the leading '>' is not included (if present).

Usage

fasta_header_to_id(x)

Arguments

`x`	array of fasta headers

Details

this regex should be robust for all sorts of input, including those not following official standards. should be ">proteinid whatever". But if additional whitespace is provided at the start we are robust to this.

e.g. these should all yield 'pid' as the protein ID; fasta_header_to_id(c(" > pid description", "> pid description", " pid description", "pid description"))

ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.