import_fasta: Import fasta file(s) that match your dataset.

View source: R/process_peptide_data.R

import_fastaR Documentation

Import fasta file(s) that match your dataset.

Description

For fasta files from uniprot, we extract the gene symbols for each protein-group (not available for non-uniprot fasta files).

Usage

import_fasta(
  dataset,
  files = NULL,
  fasta_id_type = "uniprot",
  protein_separation_character = ";",
  uppercase_symbols = TRUE
)

Arguments

dataset

your dataset

files

an array of filenames, these should be the full path

fasta_id_type

what type of fasta files are these? options: "uniprot" (highly recommended) or otherwise any other character string (as we have no special rules for generic fasta files)

protein_separation_character

the separation character for protein identifiers in your dataset. Most commonly this is a semicolon (eg; in maxquant/metamorpheus/skyline/etc.)

uppercase_symbols

convert all gene symbols to upper case? default: TRUE

Value

table where protein_id = provided proteingroup identifier, accessions = result from fasta_id_short applied to each semicolon-delimited element in protein_id (result is a semicolon-collapsed string), fasta_headers = analogous to accessions, but matching the full FASTA header to each 'accessions' element, gene_symbols = full set of gene symbols, '-' where missing in FASTA, matching each element in 'accessions', gene_symbols_or_id = unique set of valid 'gene_symbol', or the FASTA full/long ID when there is no gene information gene_symbol_ucount = number of unique gene_symbols for this proteingroup (i.e. unique valid elements in 'gene_symbols')


ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.