proteinLocsToGenomic: Obtaining the genomic coordinates for a list of protein...

View source: R/proteinLocsToGenomic.R

proteinLocsToGenomicR Documentation

Obtaining the genomic coordinates for a list of protein sections

Description

The function takes a list of protein sections and the corresponding ENSEMBL ID of these proteins, and tries to find the genomic coordinates of these protein sections.

Usage

proteinLocsToGenomic(inputLoci, CDSaaFile)

Arguments

inputLoci

A data frame containing the protein sections as the input. The 1st column must be the ENSEMBL ID of either the protein or the transcript encoding the protein (or the equivalent of ENSEMBL ID if you have created your own gene annotation GTF file). But you have to use only one of two formats (namely either protein ID or transcript ID), and cannot use both of them in the input of one function call. The 2nd and 3rd columns give the coordinate of the first and last amino acids of the section along the protein sequence. Other columns are optional and will not be used by the function.

CDSaaFile

The data file generated by the package's function generatingCDSaaFile, containing the genomic locations, DNA sequences and protein sequences of all coding regions in a specific genome which is used in your analysis.

Value

The function returns a data frame containing the original protein locations specified in the input and before them, the six added columns for the corresponding genomic coordinates of the protein sections:

  • The 1st, 2nd, 3rd and 4th columns give the chromosome name, the coordinates of the start and end positions, and the strand in the chromosome, which specify the genomic locus corresponding to the protein section.

  • The 5th and 6th columns give the first and last coding exons in the given transcript which correspond to the given protein section.

Author(s)

Yaoyong Li

Examples


    dataFolder = system.file("extdata", package="geno2proteo")
    inputFile_loci=file.path(dataFolder, 
        "transId_pfamDomainStartEnd_chr16_Zdomains_22examples.txt")
    CDSaaFile=file.path(dataFolder, 
        "Homo_sapiens.GRCh37.74_chromosome16_35Mlong.gtf.gz_AAseq.txt.gz")

    inputLoci = read.table(inputFile_loci, sep="\t", stringsAsFactors=FALSE)

    genomicLoci = proteinLocsToGenomic(inputLoci=inputLoci, CDSaaFile=CDSaaFile)

geno2proteo documentation built on June 13, 2022, 5:08 p.m.