annotate.protein_id: Annotate protein_id

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/annotate.protein_id.R

Description

This function assigns the protein identifier for a list of tandem mass specs having a peptide sequence assigned.

Usage

1
2
annotate.protein_id(data, file = NULL, fasta = read.fasta(file = file, 
         as.string = TRUE, seqtype = "AA"), digestPattern = "(([RK])|(^)|(^M))") 

Arguments

data

list of records containing mZ and peptide sequences.

file

file name of a FASTA file.

fasta

a fasta object as returned by the seqinr::read.fasta(...) method.

digestPattern

a regex pattern which can be used by the grep command. the default regex pattern assumes a tryptic digest.

Details

The protein sequences a read by the read.fasta function of the seqinr package. The protein identifier is written to the protein proteinInformation variable.

If the function is called on a multi-core architecture it uses mclapply.

It is recommended to load the FASTA file prior to running annotate.protein_id using

myFASTA <- read.fasta(file = file, as.string = TRUE, seqtype = "AA")

instead of providing the FASTA file name to the function.

Value

it returns a list object.

Author(s)

Jonas Grossmann and Christian Panse, 2014

See Also

?read.fasta of the seqinr package.

http://www.uniprot.org/help/fasta-headers

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
    # annotate.protein_id
    
    # our Fasta sequence
      irtFASTAseq <- paste(">zz|ZZ_FGCZCont0260|", 
      "iRT_Protein_with_AAAAK_spacers concatenated Biognosys\n",
      "LGGNEQVTRAAAAKGAGSSEPVTGLDAKAAAAKVEATFGVDESNAKAAAAKYILAGVENS",
      "KAAAAKTPVISGGPYEYRAAAAKTPVITGAPYEYRAAAAKDGLDAASYYAPVRAAAAKAD",
      "VTPADFSEWSKAAAAKGTFIIDPGGVIRAAAAKGTFIIDPAAVIRAAAAKLFLQFGAQGS",
      "PFLK\n")
      
    # be realistic, do it from file
      Tfile <- file();  cat(irtFASTAseq, file = Tfile);
      
    #use read.fasta from seqinr
      fasta.irtFASTAseq <-read.fasta(Tfile, as.string=TRUE, seqtype="AA")
      close(Tfile)
    
    #annotate with proteinID 
    # -> here we find all psms from the one proteinID above
      peptideStd <- specL::annotate.protein_id(peptideStd, 
      fasta=fasta.irtFASTAseq)
  
    #show indices for all PSMs where we have a proteinInformation
     which(unlist(lapply(peptideStd, 
      function(x){nchar(x$proteinInformation)>0})))

specL documentation built on Nov. 8, 2020, 7:55 p.m.