annotate.protein_id: Annotate protein_id
In specL: specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics

Description Usage Arguments Details Value Author(s) See Also Examples

This function assigns the protein identifier for a list of tandem mass specs having a peptide sequence assigned.

1 2	annotate.protein_id(data, file = NULL, fasta = read.fasta(file = file, as.string = TRUE, seqtype = "AA"), digestPattern = "(([RK])\|(^)\|(^M))")

`data`	list of records containing mZ and peptide sequences.
`file`	file name of a FASTA file.
`fasta`	a fasta object as returned by the `seqinr::read.fasta(...)` method.
`digestPattern`	a regex pattern which can be used by the `grep` command. the default regex pattern assumes a tryptic digest.

The protein sequences a read by the read.fasta function of the seqinr package. The protein identifier is written to the protein proteinInformation variable.

If the function is called on a multi-core architecture it uses mclapply.

It is recommended to load the FASTA file prior to running annotate.protein_id using

myFASTA <- read.fasta(file = file, as.string = TRUE, seqtype = "AA")

instead of providing the FASTA file name to the function.

it returns a list object.

Jonas Grossmann and Christian Panse, 2014

?read.fasta of the seqinr package.

http://www.uniprot.org/help/fasta-headers

    # annotate.protein_id
    
    # our Fasta sequence
      irtFASTAseq <- paste(">zz|ZZ_FGCZCont0260|", 
      "iRT_Protein_with_AAAAK_spacers concatenated Biognosys\n",
      "LGGNEQVTRAAAAKGAGSSEPVTGLDAKAAAAKVEATFGVDESNAKAAAAKYILAGVENS",
      "KAAAAKTPVISGGPYEYRAAAAKTPVITGAPYEYRAAAAKDGLDAASYYAPVRAAAAKAD",
      "VTPADFSEWSKAAAAKGTFIIDPGGVIRAAAAKGTFIIDPAAVIRAAAAKLFLQFGAQGS",
      "PFLK\n")
      
    # be realistic, do it from file
      Tfile <- file();  cat(irtFASTAseq, file = Tfile);
      
    #use read.fasta from seqinr
      fasta.irtFASTAseq <-read.fasta(Tfile, as.string=TRUE, seqtype="AA")
      close(Tfile)
    
    #annotate with proteinID 
    # -> here we find all psms from the one proteinID above
      peptideStd <- specL::annotate.protein_id(peptideStd, 
      fasta=fasta.irtFASTAseq)
  
    #show indices for all PSMs where we have a proteinInformation
     which(unlist(lapply(peptideStd, 
      function(x){nchar(x$proteinInformation)>0})))