snvToPepFasta: Single nucleotide variant (SNV) to peptide workflow

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/SNVtoPep_functions.R

Description

This is a wrapper for the whole computing of SNV mutations into transcripts, digest these transcripts into small peptides and write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

Usage

1
2
snvToPepFasta(tbl, glst, mymart, myarchive, spath, tpath, width = 60,
intermediate = FALSE, target = "K|R", exception = "P")

Arguments

tbl

Data.frame of ANNOVAR annotated SNVs.

glst

Data.frame of gennames, column Genes.

mymart

Mart to retrieve the ENST from via biomaRt.

myarchive

Logical that indicates if a archive mart is given, (default FALSE).

spath

Character string giving the path to HUMAN Ensemble peptide database in FASTA.

tpath

Character string giving the path where to write the mutated and digested sequences in FASTA format.

width

Width of the sequence in the result (default 60).

intermediate

Logical, TRUE if you would like to have intermediate output, FALSE if not (default).

target

Character string, pattern to be matched before the cleavage site (default "K|R").

exception

Character string, pattern that avoids a cleavage when it can be found behind it. (default"P").

Details

The Refseq mRNA ID NM_ID will be used by biomaRt to querry the Ensemble transcript ID (ENST).
http://www.ncbi.nlm.nih.gov/refseq/

The header of the FASTA file will look like this:
>ENST|description| originalAminoacid->mutatedAminoacid_positionAminoacid ...

If the annotated change does not fit to the ENST it will look like:
wrong: originalAminoacid->mutatedAminoacid_positionAminoacid

If the ENST matches two or more NM_IDs, there will be a counter in the header:
>ENSTxcounter|...

Trypsination rule: cut after K and R except when followed by P

You can use target and exception to set other rules for digestion.
The patterns for target and exception are restricted to one aminoacid.
Aminoacids: ARNDCQEGHILKMFPSTWYV
valid patterns: A|R|W|H, P|S
invalid patterns: Z|F|A|D, AR|NDC|STW

The analysis is based on Ensembl proteindata:
http://www.ensembl.org/index.html
The SNVs annotation has to look like ANNOVAR:
http://www.openbioinformatics.org/annovar/

Value

If you set intermediate to TRUE you will get the following output:

aachanges

A data.frame like tbl, with new columns that describe the aminoacid changes.

transcripts

Data.frame, containing: ensemble_transcript_id, nmid and pname.

mutfasta

Character vector that contains FASTA headers and peptide sequences.

mutlog

Character vector contains log entries of errors reported during mutation (mutateProtToPep()).

Otherwise just a character vector where to find the FASTA file or an error message.

Note

The intermediate output will be big (in most cases), use a variable to save the result.

Author(s)

Rafael Dellen
Rafael.Dellen@uni-duesseldorf.de

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#load data and set arguments

#data.frame with SNVs
tbl <- system.file("extdata", "ExampleData.RData", package="PepPrep")
load(tbl)

glst <- data.frame(Genes="CAP1", stringsAsFactors=FALSE)

#peptide sequence
spath <- system.file("extdata", "ExampleHomo_sapiens.GRCh37.70.pep.all.fa", package="PepPrep")

#where to write the result and how to write
tpath <- paste0(getwd(), "/myTest_snvToPep.fasta")
width <- 60

#biomaRt settings
mymart <- "ensembl"
myarchive <- FALSE

#call workflow
## Not run: 
test <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath,width)
test2 <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath, width, intermediat= TRUE)
## End(Not run)

PepPrep documentation built on May 1, 2019, 9:12 p.m.