remap_accessions_refseq_to_gene_fasta: Remap Sequence IDs in FASTA File

Description Usage Arguments Value Examples

View source: R/remap_fasta.R

Description

The function remaps the IDs in the FASTA file from RefSeq to Gene symbols. In case of multiple RefSeq sequences available, only the first longest is retained.

Usage

1
2
3
4
5
remap_accessions_refseq_to_gene_fasta(
  path_to_FASTA,
  organism_name,
  conversion_table
)

Arguments

organism_name

(string) Official organism name

conversion_table

(data.frame) data frame with two columns one should with named accessions and contain accessions from 'msnid' object (e.g. RefSeq) the other is with alternative annotation to map to (e.g. gene symbol).

msnid

(MSnID object) MS/MS ID data

Value

(MSnID object) MS/MS ID data with computed number of peptides per 1000 aa. Added column name - "peptides_per_1000aa".

Examples

1
2
3
4
5
6
7
8
path_to_FASTA <- system.file("extdata/Rattus_norvegicus_NCBI_RefSeq_2018-04-10.fasta.gz", package = "PlexedPiperTestData")
temp_work_dir <- tempdir() # can be set to "." or getwd(), if done carefully
file.copy(path_to_FASTA, temp_work_dir)
path_to_FASTA <- file.path(temp_work_dir, basename(path_to_FASTA))
library(Biostrings)
readAAStringSet(path_to_FASTA) # refseq IDs
path_to_new_FASTA <- remap_accessions_refseq_to_gene_fasta(path_to_FASTA,"Rattus norvegicus")
readAAStringSet(path_to_new_FASTA) # gene IDs

vladpetyuk/PlexedPiper documentation built on June 24, 2021, 8:59 a.m.