dna2codonDT: Translate a list of coding sequences into a codon data.table

View source: R/dna2codonDT.R

dna2codonDTR Documentation

Translate a list of coding sequences into a codon data.table

Description

Convert a DNA coding sequence into a data.table of codons, nucleotide indexes, DNA triplet sequences, and amino acids. If provided with coding sequences split into exons, these will also be incorporated into the table.

Usage

dna2codonDT(dnaSeq, type, compressTab = FALSE, geneticCode = 1)

Arguments

dnaSeq

Character: For a coding sequence (e.g., cDNA transcript), a single character vector. For a group of exons, a list of character vectors. See argument type and Details.

type

Character: one of 'cds' (coding sequence) or 'exons' (exon regions). See Details.

compressTab

Logical: Should the table be compressed? Default is TRUE, in which case, each row is a unique codon. If FALSE, then each codon is represented by 3 rows, one for each nucleotide comprising the codon.

geneticCode

Integer: A value relating to the numcode argument in seqinr::translate.

Details

The argument type dictates what to pass to the argument dnaSeq. If you want to translate a coding sequence (cDNA transcript), then type=='cds' and dnaSeq must recieve a single character, the DNA sequence.

If you want to translate a series of exons, then type='exon' and dnaSeq must receive a list, where each indexed item in the list is a character vector, the DNA exon sequence. Note, it is assumed that the exon sequences are ordered correctly, from first to last.

For both cases, the function assumes that the sequence is in the correct reading frame.

Value

Returns a data.table with the following columns when compressTab==TRUE:

  1. $CODON = The codon number, 1:N.

  2. $NUC.GENE = The nucleotides positions comprising the codon, from 1 to the gene's length.

  3. $DNA = The DNA bases.

  4. $AMINO = The amino acid residue.

  5. $EXON = The exon number, but only when type=='exon'.

Otherwise, if compressTab==FALSE:

  1. $CODON = The codon number, 1:N.

  2. $NUC.GENE = The nucleotides position within the gene, from from 1 to the gene's length.

  3. $NUC.CODON = The nucleotides positions within the codon, from 1 to 3.

  4. $DNA = The DNA bases.

  5. $AMINO = The amino acid residue.

  6. $EXON = The exon number, but only when type=='exon'.

Examples

X <- 'ATGCGTACTTCA'

dna2codonDT(X, compressTab=TRUE)

dna2codonDT(X, compressTab=FALSE)


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.