snvToPepFasta: Single nucleotide variant (SNV) to peptide workflow
In PepPrep: Insilico peptide mutation, digestion and homologous comparison.

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/SNVtoPep_functions.R

This is a wrapper for the whole computing of SNV mutations into transcripts, digest these transcripts into small peptides and write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

1 2	snvToPepFasta(tbl, glst, mymart, myarchive, spath, tpath, width = 60, intermediate = FALSE, target = "K\|R", exception = "P")

`tbl`	Data.frame of ANNOVAR annotated SNVs.
`glst`	Data.frame of gennames, column Genes.
`mymart`	Mart to retrieve the ENST from via biomaRt.
`myarchive`	Logical that indicates if a archive mart is given, (default FALSE).
`spath`	Character string giving the path to HUMAN Ensemble peptide database in FASTA.
`tpath`	Character string giving the path where to write the mutated and digested sequences in FASTA format.
`width`	Width of the sequence in the result (default 60).
`intermediate`	Logical, TRUE if you would like to have intermediate output, FALSE if not (default).
`target`	Character string, pattern to be matched before the cleavage site (default "K\|R").
`exception`	Character string, pattern that avoids a cleavage when it can be found behind it. (default"P").

The Refseq mRNA ID NM_ID will be used by biomaRt to querry the Ensemble transcript ID (ENST).
http://www.ncbi.nlm.nih.gov/refseq/

The header of the FASTA file will look like this:
>ENST|description| originalAminoacid->mutatedAminoacid_positionAminoacid ...

If the annotated change does not fit to the ENST it will look like:
wrong: originalAminoacid->mutatedAminoacid_positionAminoacid

If the ENST matches two or more NM_IDs, there will be a counter in the header:
>ENSTxcounter|...

Trypsination rule: cut after K and R except when followed by P

You can use target and exception to set other rules for digestion.
The patterns for target and exception are restricted to one aminoacid.
Aminoacids: ARNDCQEGHILKMFPSTWYV
valid patterns: A|R|W|H, P|S
invalid patterns: Z|F|A|D, AR|NDC|STW

The analysis is based on Ensembl proteindata:
http://www.ensembl.org/index.html
The SNVs annotation has to look like ANNOVAR:
http://www.openbioinformatics.org/annovar/

If you set intermediate to TRUE you will get the following output:

`aachanges`	A data.frame like tbl, with new columns that describe the aminoacid changes.
`transcripts`	Data.frame, containing: ensemble_transcript_id, nmid and pname.
`mutfasta`	Character vector that contains FASTA headers and peptide sequences.
`mutlog`	Character vector contains log entries of errors reported during mutation (mutateProtToPep()).

Otherwise just a character vector where to find the FASTA file or an error message.

The intermediate output will be big (in most cases), use a variable to save the result.

Rafael Dellen
Rafael.Dellen@uni-duesseldorf.de

#load data and set arguments

#data.frame with SNVs
tbl <- system.file("extdata", "ExampleData.RData", package="PepPrep")
load(tbl)

glst <- data.frame(Genes="CAP1", stringsAsFactors=FALSE)

#peptide sequence
spath <- system.file("extdata", "ExampleHomo_sapiens.GRCh37.70.pep.all.fa", package="PepPrep")

#where to write the result and how to write
tpath <- paste0(getwd(), "/myTest_snvToPep.fasta")
width <- 60

#biomaRt settings
mymart <- "ensembl"
myarchive <- FALSE

#call workflow
## Not run: 
test <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath,width)
test2 <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath, width, intermediat= TRUE)
## End(Not run)