genomicLocsToProteinSequence: Obtaining the protein sequences and DNA sequences of the...

Description Usage Arguments Value Author(s) Examples

View source: R/genomicLocsToProteinSequence.R

Description

genomicsLocToProteinSequence takes a list of genomic loci given in the input and tries to find the protein sequences and DNA sequences of the coding regions of genome which are within those genomic loci.

Usage

1
genomicLocsToProteinSequence(inputLoci, CDSaaFile)

Arguments

inputLoci

A data frame containing the genomic loci as the input. Each row is for one genomic locus. The first column is for the chromosome, the 2nd and 3rd columns are for the start and end coordinates of the locus in the chromosome, and the 4th column is for the strand ("+" or "-" for forward and reverse strand, respectively). Other columns are optional and will not be used by the function. Note that the chromosome name can be either in the ENSEMBL style, e.g. 1, 2, 3, ..., and X, Y and MT, or in another popular style, namely chr1, chr2, chr3, ..., and chrX, chrY and chrM. But they cannot be mixed in the input of one function call.

CDSaaFile

The data file generated by the package's function generatingCDSaaFile, containing the genomic locations, DNA sequences and protein sequences of all coding regions in a specific genome which is used in your analysis.

Value

A data frame containing the original genomic loci specified in the input and the protein sequence and the DNA sequence of the coding regions within each of the loci. In detail, the returned data frame contains the original genomic loci specified in the input and after them, the five added columns:

Author(s)

Yaoyong Li

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    dataFolder = system.file("extdata", package="geno2proteo")
    inputFile_loci=file.path(dataFolder, 
        "transId_pfamDomainStartEnd_chr16_Zdomains_22examples_genomicPos.txt")
    CDSaaFile=file.path(dataFolder, 
        "Homo_sapiens.GRCh37.74_chromosome16_35Mlong.gtf.gz_AAseq.txt.gz")

    inputLoci = read.table(inputFile_loci, sep="\t", stringsAsFactors=FALSE)

    proteinSeq = genomicLocsToProteinSequence(inputLoci=inputLoci, 
                                            CDSaaFile=CDSaaFile)

geno2proteo documentation built on Jan. 24, 2018, 7:50 p.m.