View source: R/genomicLocsToProteinSequence.R
genomicLocsToProteinSequence | R Documentation |
genomicsLocToProteinSequence
takes a list of genomic loci given in the
input and tries to find the protein sequences and DNA sequences of the coding
regions of genome which are within those genomic loci.
genomicLocsToProteinSequence(inputLoci, CDSaaFile)
inputLoci |
A data frame containing the genomic loci as the input. Each row is for one genomic locus. The first column is for the chromosome, the 2nd and 3rd columns are for the start and end coordinates of the locus in the chromosome, and the 4th column is for the strand ("+" or "-" for forward and reverse strand, respectively). Other columns are optional and will not be used by the function. Note that the chromosome name can be either in the ENSEMBL style, e.g. 1, 2, 3, ..., and X, Y and MT, or in another popular style, namely chr1, chr2, chr3, ..., and chrX, chrY and chrM. But they cannot be mixed in the input of one function call. |
CDSaaFile |
The data file generated by the package's function |
A data frame containing the original genomic loci specified in the input and the protein sequence and the DNA sequence of the coding regions within each of the loci. In detail, the returned data frame contains the original genomic loci specified in the input and after them, the five added columns:
Column "transId" lists the ENSEMBL IDs of the transcripts whose coding regions overlap with locus specified and the overlapping coding regions are exactly the same among those transcripts.
Column "dnaSeq" contains the DNA sequence in the overlapping coding regions.
Column "dnaBefore" contains the DNA letters which are in the same codon as the first letter in the DNA sequence in the column "dnaSeq".
Column "dnaAfter" contains the DNA letters which are in the same codon as the last letter in the DNA sequence in the previous column 'dnaSeq'.
Column "pepSeq" contains the protein sequence translated from the DNA sequences in the three preceding columns, "dnaBefore", "dnaSeq" and "dnaAfter".
Yaoyong Li
dataFolder = system.file("extdata", package="geno2proteo") inputFile_loci=file.path(dataFolder, "transId_pfamDomainStartEnd_chr16_Zdomains_22examples_genomicPos.txt") CDSaaFile=file.path(dataFolder, "Homo_sapiens.GRCh37.74_chromosome16_35Mlong.gtf.gz_AAseq.txt.gz") inputLoci = read.table(inputFile_loci, sep="\t", stringsAsFactors=FALSE) proteinSeq = genomicLocsToProteinSequence(inputLoci=inputLoci, CDSaaFile=CDSaaFile)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.