compHomToPepFasta: Comparison of proteins and creating homologous peptides...
In PepPrep: Insilico peptide mutation, digestion and homologous comparison.

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/CompHomPep_functions.R

This is a wrapper for searching pairs of protein sequences by UniProt EntryName, digesting both sequences with trypsin, find homologous parts, remove duplicates, build a new sequence out of them and write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

1 2	compHomToPepFasta(path_o1, path_o2, path, width = 60, intermediate = FALSE, target = "K\|R", exception = "P")

`path_o1`	Character string indicating the path to a uniprot proteom FASTA database, for the first organism.
`path_o2`	Character string indicating the path to a uniprot proteom FASTA database, for the second organism.
`path`	Character string indicating the path where to write the resulting FASTA file.
`width`	Width of the sequence in the result (default 60).
`intermediate`	Logical, TRUE if you would like to have intermediate output, FALSE if not (default).
`target`	Character string, pattern to be matched before the cleavage site (default "K\|R").
`exception`	Character string, pattern that avoids a cleavage when it can be found behind it. (default"P").

Searching pairs of protein sequences by UniProt EntryName in both organisms:
Org1: Human
>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
Org2: Mouse
>sp|Q9CQV8-2|1433B_MOUSE Isoform Short of 14-3-3 protein beta/alpha OS=Mus musculus GN=Ywhab
>sp|Q9CQV8|1433B_MOUSE 14-3-3 protein beta/alpha OS=Mus musculus GN=Ywhab PE=1 SV=3
Pairs:
P31946|1433B_HUMAN Q9CQV8-2|1433B_MOUSE
P31946|1433B_HUMAN Q9CQV8|1433B_MOUSE

Digesting both sequences with trypsin:
Org1: >sp|P31946|1433B_HUMAN ...
MTMDKSELVQKAKLAEQAERYDDMAAAMK...
Org2: >sp|Q9CQV8-2|1433B_MOUSE ...
MDKSELVQKAKLAEQAERYDDMAAAMK...

Find homologous parts, remove duplicates, build a new sequence out of them:
Homolog Org1Org2: >sp|P31946|1433B_HUMAN ... org2:sp|Q9CQV8-2|1433B_MOUSE ...
SELVQKAKLAEQAERYDDMAAAMK...

Write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

You can use target and exception to set other rules for digestion.
The patterns for target and exception are restricted to one aminoacid.
Aminoacids: ARNDCQEGHILKMFPSTWYV
valid patterns: A|R|W|H, P|S
invalid patterns: Z|F|A|D, AR|NDC|STW

UniProt, the source of the proteoms:
http://www.uniprot.org/

If you set intermediate to TRUE you will get the following output:

`tbl`	A data.frame that contains the proteinpairs, the header and the homologous sequence.
`fasta`	Character vector of the resulting FASTA file.

Otherwise just a character vector where to find the FASTA file or an error message.

The intermediate output will be big (in most cases), use a variable to save the result.

Rafael Dellen
Rafael.Dellen@uni-duesseldorf.de

#load data and set arguments

#Uniprot proteom FASTA databases 
#(just a small example with two proteins each)
path_o1 <- system.file("extdata", "ExampleHumanProt.fasta", package="PepPrep")
path_o2 <- system.file("extdata", "ExampleMouseProt.fasta", package="PepPrep")

#where to write the result and how to formate
path <- paste0(getwd(), "/myTest_compHomToPep.fasta")
width <- 60

#call workflow
test <- compHomToPepFasta(path_o1, path_o2, path, width)
test <- compHomToPepFasta(path_o1, path_o2, path, width, intermediate=TRUE)