import_fasta: Import fasta file(s) that match your dataset.
In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline

import_fasta

R Documentation

Import fasta file(s) that match your dataset.

Description

For fasta files from uniprot, we extract the gene symbols for each protein-group (not available for non-uniprot fasta files).

Usage

import_fasta(
  dataset,
  files = NULL,
  fasta_id_type = "uniprot",
  protein_separation_character = ";",
  uppercase_symbols = TRUE
)

Arguments

`dataset`	your dataset
`files`	an array of filenames, these should be the full path
`fasta_id_type`	what type of fasta files are these? options: "uniprot" (highly recommended) or otherwise any other character string (as we have no special rules for generic fasta files)
`protein_separation_character`	the separation character for protein identifiers in your dataset. Most commonly this is a semicolon (eg; in maxquant/metamorpheus/skyline/etc.)
`uppercase_symbols`	convert all gene symbols to upper case? default: TRUE

Value

table where protein_id = provided proteingroup identifier, accessions = result from fasta_id_short applied to each semicolon-delimited element in protein_id (result is a semicolon-collapsed string), fasta_headers = analogous to accessions, but matching the full FASTA header to each 'accessions' element, gene_symbols = full set of gene symbols, '-' where missing in FASTA, matching each element in 'accessions', gene_symbols_or_id = unique set of valid 'gene_symbol', or the FASTA full/long ID when there is no gene information gene_symbol_ucount = number of unique gene_symbols for this proteingroup (i.e. unique valid elements in 'gene_symbols')

ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.