The function takes a list of protein sections and the corresponding ENSEMBL ID of these proteins, and tries to find the genomic coordinates of these protein sections.
A data frame containing the protein sections as the input. The 1st column must be the ENSEMBL ID of either the protein or the transcript encoding the protein (or the equivalent of ENSEMBL ID if you have created your own gene annotation GTF file). But you have to use only one of two formats (namely either protein ID or transcript ID), and cannot use both of them in the input of one function call. The 2nd and 3rd columns give the coordinate of the first and last amino acids of the section along the protein sequence. Other columns are optional and will not be used by the function.
The data file generated by the package's function
The function returns a data frame containing the original protein locations specified in the input and before them, the six added columns for the corresponding genomic coordinates of the protein sections:
The 1st, 2nd, 3rd and 4th columns give the chromosome name, the coordinates of the start and end positions, and the strand in the chromosome, which specify the genomic locus corresponding to the protein section.
The 5th and 6th columns give the first and last coding exons in the given transcript which correspond to the given protein section.
dataFolder = system.file("extdata", package="geno2proteo") inputFile_loci=file.path(dataFolder, "transId_pfamDomainStartEnd_chr16_Zdomains_22examples.txt") CDSaaFile=file.path(dataFolder, "Homo_sapiens.GRCh37.74_chromosome16_35Mlong.gtf.gz_AAseq.txt.gz") inputLoci = read.table(inputFile_loci, sep="\t", stringsAsFactors=FALSE) genomicLoci = proteinLocsToGenomic(inputLoci=inputLoci, CDSaaFile=CDSaaFile)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.