Description Usage Arguments Details Value Note Author(s) Examples
View source: R/SNVtoPep_functions.R
This is a wrapper for the whole computing of SNV mutations into transcripts, digest these transcripts into small peptides and write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).
1 2 | snvToPepFasta(tbl, glst, mymart, myarchive, spath, tpath, width = 60,
intermediate = FALSE, target = "K|R", exception = "P")
|
tbl |
Data.frame of ANNOVAR annotated SNVs. |
glst |
Data.frame of gennames, column Genes. |
mymart |
Mart to retrieve the ENST from via biomaRt. |
myarchive |
Logical that indicates if a archive mart is given, (default FALSE). |
spath |
Character string giving the path to HUMAN Ensemble peptide database in FASTA. |
tpath |
Character string giving the path where to write the mutated and digested sequences in FASTA format. |
width |
Width of the sequence in the result (default 60). |
intermediate |
Logical, TRUE if you would like to have intermediate output, FALSE if not (default). |
target |
Character string, pattern to be matched before the cleavage site (default "K|R"). |
exception |
Character string, pattern that avoids a cleavage when it can be found behind it. (default"P"). |
The Refseq mRNA ID NM_ID will be used by biomaRt to querry the Ensemble transcript ID (ENST).
http://www.ncbi.nlm.nih.gov/refseq/
The header of the FASTA file will look like this:
>ENST|description| originalAminoacid->mutatedAminoacid_positionAminoacid ...
If the annotated change does not fit to the ENST it will look like:
wrong: originalAminoacid->mutatedAminoacid_positionAminoacid
If the ENST matches two or more NM_IDs, there will be a counter in the header:
>ENSTxcounter|...
Trypsination rule: cut after K and R except when followed by P
You can use target and exception to set other rules for digestion.
The patterns for target and exception are restricted to one aminoacid.
Aminoacids: ARNDCQEGHILKMFPSTWYV
valid patterns: A|R|W|H, P|S
invalid patterns: Z|F|A|D, AR|NDC|STW
The analysis is based on Ensembl proteindata:
http://www.ensembl.org/index.html
The SNVs annotation has to look like ANNOVAR:
http://www.openbioinformatics.org/annovar/
If you set intermediate to TRUE you will get the following output:
aachanges |
A data.frame like tbl, with new columns that describe the aminoacid changes. |
transcripts |
Data.frame, containing: ensemble_transcript_id, nmid and pname. |
mutfasta |
Character vector that contains FASTA headers and peptide sequences. |
mutlog |
Character vector contains log entries of errors reported during mutation (mutateProtToPep()). |
Otherwise just a character vector where to find the FASTA file or an error message.
The intermediate output will be big (in most cases), use a variable to save the result.
Rafael Dellen
Rafael.Dellen@uni-duesseldorf.de
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #load data and set arguments
#data.frame with SNVs
tbl <- system.file("extdata", "ExampleData.RData", package="PepPrep")
load(tbl)
glst <- data.frame(Genes="CAP1", stringsAsFactors=FALSE)
#peptide sequence
spath <- system.file("extdata", "ExampleHomo_sapiens.GRCh37.70.pep.all.fa", package="PepPrep")
#where to write the result and how to write
tpath <- paste0(getwd(), "/myTest_snvToPep.fasta")
width <- 60
#biomaRt settings
mymart <- "ensembl"
myarchive <- FALSE
#call workflow
## Not run:
test <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath,width)
test2 <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath, width, intermediat= TRUE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.