remap_fasta_entry_names: Remapping entries in FASTA file

Description Usage Arguments Value Author(s) Examples

View source: R/remap_fasta_entry_names.R

Description

Remaps entries in the FASTA file from one protein identifier to another according to provided conersion table. Input is a path to FASTA file. Output is also a path to a new FASTA file with updated entry names.

Usage

1
2
3
4
5
remap_fasta_entry_names(path_to_FASTA,
                        conversion_table,
                        extraction_pttrn=c("\\|([^|-]+)(-\\d+)?\\|",
                                                 "([A-Z]P_\\d+)",
                                                 "(ENS[A-Z0-9]+)"))

Arguments

path_to_FASTA

(string) path to FASTA file

conversion_table

(data.frame) first column in the data frame corresponds to identifiers in the FASTA file. Second column is the new identifier.

extraction_pttrn

(string) regex pattern that extract protein identifier from FASTA entry name as first group (that is "\\1"). The most common patterns are the one corresponding to UniProt "\\|([^|-]+)(-\\d+)?\\|", RefSeq "^([A-Z]P_\\d+)" and ENSEMBL "^(ENS[A-Z0-9]+)". Other regex patterns can be accepted as well. Defaul is UniProt pattern.

Value

path to new FASTA file

Author(s)

Vladislav A Petyuk vladislav.petyuk@pnnl.gov

Examples

1
2
3
4
5
6
7
library(Biostrings)
fst_path <- system.file("extdata","for_phospho.fasta.gz",package="MSnID")
readAAStringSet(fst_path)
conv_tab <- fetch_conversion_table("Homo sapiens", "UNIPROT", "SYMBOL")
fst_path_2 <- remap_fasta_entry_names(fst_path, conv_tab, "\\|([^|-]+)(-\\d+)?\\|")
readAAStringSet(fst_path_2)
file.remove(fst_path_2)

MSnID documentation built on Nov. 8, 2020, 8:03 p.m.