proteinLocsToProteinSeq: Obtaining the amino acid sequences of a list of protein...

View source: R/proteinLocsToProteinSeq.R

proteinLocsToProteinSeqR Documentation

Obtaining the amino acid sequences of a list of protein sections


Given a list of sections in proteins defined by the ENSEMBL IDs of those proteins and thestart and end coordinates of those sections along the amino acid sequences of the proteins, the function returns the amino acid sequences of those sections.


proteinLocsToProteinSeq(inputLoci, CDSaaFile)



A data frame containing the coordinates of the protein sections in the protein sequences. The 1st column must be the ENSEMBL ID of either the protein or the transcript that the protein corresponds to (or the equivalent of ENSEMBL ID if you have created your own gene annotation GTF file). But you have to use onnly one of two formats (namely protein ID or transcript ID), and cannot use both of them in the input of one function call. The 2nd and 3rd columns give the coordinate of the first and last amino acids of the section in the protein sequence. Other columns are optional and will not be used by the function.


The data file generated by the package's function generatingCDSaaFile, containing the genomic locations, DNA sequences and protein sequences of all coding regions in a specific genome which is used in your analysis.


The function returns a data frame containing the original protein locations specified in the input and after them, one added columnfor the amino acid sequences of the protein sections.


Yaoyong Li


    dataFolder = system.file("extdata", package="geno2proteo")

    inputLoci = read.table(inputFile_loci, sep="\t", stringsAsFactors=FALSE)

    ProtSeqNow = proteinLocsToProteinSeq(inputLoci=inputLoci, 

geno2proteo documentation built on June 13, 2022, 5:08 p.m.