compHomToPepFasta: Comparison of proteins and creating homologous peptides...

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/CompHomPep_functions.R

Description

This is a wrapper for searching pairs of protein sequences by UniProt EntryName, digesting both sequences with trypsin, find homologous parts, remove duplicates, build a new sequence out of them and write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

Usage

1
2
compHomToPepFasta(path_o1, path_o2, path, width = 60, 
intermediate = FALSE, target = "K|R", exception = "P")

Arguments

path_o1

Character string indicating the path to a uniprot proteom FASTA database, for the first organism.

path_o2

Character string indicating the path to a uniprot proteom FASTA database, for the second organism.

path

Character string indicating the path where to write the resulting FASTA file.

width

Width of the sequence in the result (default 60).

intermediate

Logical, TRUE if you would like to have intermediate output, FALSE if not (default).

target

Character string, pattern to be matched before the cleavage site (default "K|R").

exception

Character string, pattern that avoids a cleavage when it can be found behind it. (default"P").

Details

Searching pairs of protein sequences by UniProt EntryName in both organisms:
Org1: Human
>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
Org2: Mouse
>sp|Q9CQV8-2|1433B_MOUSE Isoform Short of 14-3-3 protein beta/alpha OS=Mus musculus GN=Ywhab
>sp|Q9CQV8|1433B_MOUSE 14-3-3 protein beta/alpha OS=Mus musculus GN=Ywhab PE=1 SV=3
Pairs:
P31946|1433B_HUMAN Q9CQV8-2|1433B_MOUSE
P31946|1433B_HUMAN Q9CQV8|1433B_MOUSE

Digesting both sequences with trypsin:
Org1: >sp|P31946|1433B_HUMAN ...
MTMDKSELVQKAKLAEQAERYDDMAAAMK...
Org2: >sp|Q9CQV8-2|1433B_MOUSE ...
MDKSELVQKAKLAEQAERYDDMAAAMK...

Find homologous parts, remove duplicates, build a new sequence out of them:
Homolog Org1Org2: >sp|P31946|1433B_HUMAN ... org2:sp|Q9CQV8-2|1433B_MOUSE ...
SELVQKAKLAEQAERYDDMAAAMK...

Write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

You can use target and exception to set other rules for digestion.
The patterns for target and exception are restricted to one aminoacid.
Aminoacids: ARNDCQEGHILKMFPSTWYV
valid patterns: A|R|W|H, P|S
invalid patterns: Z|F|A|D, AR|NDC|STW

UniProt, the source of the proteoms:
http://www.uniprot.org/

Value

If you set intermediate to TRUE you will get the following output:

tbl

A data.frame that contains the proteinpairs, the header and the homologous sequence.

fasta

Character vector of the resulting FASTA file.

Otherwise just a character vector where to find the FASTA file or an error message.

Note

The intermediate output will be big (in most cases), use a variable to save the result.

Author(s)

Rafael Dellen
Rafael.Dellen@uni-duesseldorf.de

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#load data and set arguments

#Uniprot proteom FASTA databases 
#(just a small example with two proteins each)
path_o1 <- system.file("extdata", "ExampleHumanProt.fasta", package="PepPrep")
path_o2 <- system.file("extdata", "ExampleMouseProt.fasta", package="PepPrep")

#where to write the result and how to formate
path <- paste0(getwd(), "/myTest_compHomToPep.fasta")
width <- 60

#call workflow
test <- compHomToPepFasta(path_o1, path_o2, path, width)
test <- compHomToPepFasta(path_o1, path_o2, path, width, intermediate=TRUE)

PepPrep documentation built on May 1, 2019, 9:12 p.m.