import_protein_metadata_from_fasta: Parse fasta files and construct a protein metadata table for...

import_protein_metadata_from_fastaR Documentation

Parse fasta files and construct a protein metadata table for provided protein identifiers

Description

Parse fasta files and construct a protein metadata table for provided protein identifiers

Usage

import_protein_metadata_from_fasta(
  protein_id,
  fasta_files,
  fasta_id_type = "uniprot",
  protein_separation_character = ";",
  uppercase_symbols = TRUE
)

Arguments

protein_id

an array of protein identifiers (that should be available in provided fasta_files)

fasta_files

an array of filenames, these should be the full path

fasta_id_type

what type of fasta files are these? options: "uniprot" (highly recommended) or otherwise any other character string (as we have no special rules for generic fasta files)

protein_separation_character

the separation character for protein identifiers in your dataset. Most commonly this is a semicolon (eg; in maxquant/metamorpheus/skyline/etc.)

uppercase_symbols

convert all gene symbols to upper case? default: TRUE

Value

table where protein_id = provided proteingroup identifier, accessions = result from fasta_id_short applied to each semicolon-delimited element in protein_id (result is a semicolon-collapsed string), fasta_headers = analogous to accessions, but matching the full FASTA header to each 'accessions' element, gene_symbols = full set of gene symbols, '-' where missing in FASTA, matching each element in 'accessions', gene_symbols_or_id = unique set of valid 'gene_symbol', or the FASTA full/long ID when there is no gene information gene_symbol_ucount = number of unique gene_symbols for this proteingroup (i.e. unique valid elements in 'gene_symbols')


ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.