Description Usage Arguments Value Author(s) Examples
Given a list of sections in proteins defined by the ENSEMBL IDs of those proteins and thestart and end coordinates of those sections along the amino acid sequences of the proteins, the function returns the amino acid sequences of those sections.
1 | proteinLocsToProteinSeq(inputLoci, CDSaaFile)
|
inputLoci |
A data frame containing the coordinates of the protein sections in the protein sequences. The 1st column must be the ENSEMBL ID of either the protein or the transcript that the protein corresponds to (or the equivalent of ENSEMBL ID if you have created your own gene annotation GTF file). But you have to use onnly one of two formats (namely protein ID or transcript ID), and cannot use both of them in the input of one function call. The 2nd and 3rd columns give the coordinate of the first and last amino acids of the section in the protein sequence. Other columns are optional and will not be used by the function. |
CDSaaFile |
The data file generated by the package's function |
The function returns a data frame containing the original protein locations specified in the input and after them, one added columnfor the amino acid sequences of the protein sections.
Yaoyong Li
1 2 3 4 5 6 7 8 9 10 | dataFolder = system.file("extdata", package="geno2proteo")
inputFile_loci=file.path(dataFolder,
"transId_pfamDomainStartEnd_chr16_Zdomains_22examples.txt")
CDSaaFile=file.path(dataFolder,
"Homo_sapiens.GRCh37.74_chromosome16_35Mlong.gtf.gz_AAseq.txt.gz")
inputLoci = read.table(inputFile_loci, sep="\t", stringsAsFactors=FALSE)
ProtSeqNow = proteinLocsToProteinSeq(inputLoci=inputLoci,
CDSaaFile=CDSaaFile)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.