Translating DNA/RNA sequences

Share:

Description

Functions for translating DNA or RNA sequences into amino acid sequences.

Usage

1
2
3
4
5
## Translating DNA/RNA:
translate(x, genetic.code=GENETIC_CODE, if.fuzzy.codon="error")

## Extracting codons without translating them:
codons(x)

Arguments

x

A DNAStringSet, RNAStringSet, DNAString, RNAString, MaskedDNAString or MaskedRNAString object for translate.

A DNAString, RNAString, MaskedDNAString or MaskedRNAString object for codons.

genetic.code

The genetic code to use for the translation of codons into Amino Acid letters. It must be represented as a named character vector of length 64 similar to predefined constant GENETIC_CODE i.e. it must contain 1-letter strings in the Amino Acid alphabet and its names must be identical to names(GENETIC_CODE). The default value for genetic.code is GENETIC_CODE which represents The Standard Genetic Code. See ?AA_ALPHABET) for the Amino Acid alphabet and ?GENETIC_CODE for The Standard Genetic Code and its known variants.

if.fuzzy.codon

How fuzzy codons (i.e codon with IUPAC ambiguities) should be handled. Accepted values are:

  • "error": An error will be raised on the first occurence of a fuzzy codon. This is the default.

  • "solve": Fuzzy codons that can be translated non ambiguously to an amino acid or to * (stop codon) will be translated. Ambiguous fuzzy codons will be translated to X.

  • "error.if.X": Fuzzy codons that can be translated non ambiguously to an amino acid or to * (stop codon) will be translated. An error will be raised on the first occurence of an ambiguous fuzzy codon.

  • "X": All fuzzy codons (ambiguous and non-ambiguous) will be translated to X.

Alternatively if.fuzzy.codon can be specified as a character vector of length 2. The 1st string and 2nd strings specify how to handle non-ambiguous and ambiguous fuzzy codons, respectively. The accepted values for the 1st string are:

  • "error": Any occurence of a non-ambiguous fuzzy codon will cause an error.

  • "solve": Non-ambiguous fuzzy codons will be translated to an amino acid or to *.

  • "X": Non-ambiguous fuzzy codons will be translated to X.

The accepted values for the 2nd string are:

  • "error": Any occurence of an ambiguous fuzzy codon will cause an error.

  • "X": Ambiguous fuzzy codons will be translated to X.

All the 6 possible combinations of 1st and 2nd strings are supported. Note that if.fuzzy.codon=c("error", "error") is equivalent to if.fuzzy.codon="error", if.fuzzy.codon=c("solve", "X") is equivalent to if.fuzzy.codon="solve", if.fuzzy.codon=c("solve", "error") is equivalent to if.fuzzy.codon="error.if.X", and if.fuzzy.codon=c("X", "X") is equivalent to if.fuzzy.codon="X".

Details

translate reproduces the biological process of RNA translation that occurs in the cell. The input of the function can be either RNA or coding DNA. By default The Standard Genetic Code (see ?GENETIC_CODE) is used to translate codons into amino acids but the user can supply a different genetic code via the genetic.code argument.

codons is a utility for extracting the codons involved in this translation without translating them.

Value

For translate: An AAString object when x is a DNAString, RNAString, MaskedDNAString, or MaskedRNAString object. An AAStringSet object parallel to x (i.e. with 1 amino acid sequence per DNA or RNA sequence in x) when x is a DNAStringSet or RNAStringSet object. If x has names on it, they're propagated to the returned object.

For codons: An XStringViews object with 1 view per codon. When x is a MaskedDNAString or MaskedRNAString object, its masked parts are interpreted as introns and filled with the + letter in the returned object. Therefore codons that span across masked regions are represented by views that have a width > 3 and contain the + letter. Note that each view is guaranteed to contain exactly 3 base letters.

See Also

  • AA_ALPHABET for the Amino Acid alphabet.

  • GENETIC_CODE for The Standard Genetic Code and its known variants.

  • The examples for extractTranscriptSeqs in the GenomicFeatures package for computing the full proteome of a given organism.

  • The reverseComplement function.

  • The DNAStringSet and AAStringSet classes.

  • The XStringViews and MaskedXString classes.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
## ---------------------------------------------------------------------
## 1. BASIC EXAMPLES
## ---------------------------------------------------------------------
dna1 <- DNAString("TATAAATGGAGTAGATAA")
translate(dna1)

SGC1 <- getGeneticCode("SGC1")  # Vertebrate Mitochondrial code
translate(dna1, genetic.code=SGC1)

## All codons except 1st are fuzzy:
dna2 <- DNAString("TATANATGRAGYMGRTRA")

## Not run: 
  translate(dna2)  # error because of fuzzy codons

## End(Not run)
## Codons 4 to 6 are non-ambiguous and can be solved. 2nd and 3rd codons
## are ambiguous and are translated to X:
translate(dna2, if.fuzzy.codon="solve")

## Fuzzy codons that are non-ambiguous with a given genetic code can
## become ambiguous with another genetic code and vice versa:
translate(dna2, genetic.code=SGC1, if.fuzzy.codon="solve")

## ---------------------------------------------------------------------
## 2. TRANSLATING AN OPEN READING FRAME
## ---------------------------------------------------------------------
file <- system.file("extdata", "someORF.fa", package="Biostrings")
x <- readDNAStringSet(file)
x

## The first and last 1000 nucleotides are not part of the ORFs:
x <- DNAStringSet(x, start=1001, end=-1001)

## Before calling translate() on an ORF, we need to mask the introns
## if any. We can get this information fron the SGD database
## (http://www.yeastgenome.org/).
## According to SGD, the 1st ORF (YAL001C) has an intron at 71..160
## (see http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YAL001C)
y1 <- x[[1]]
mask1 <- Mask(length(y1), start=71, end=160)
masks(y1) <- mask1
y1
translate(y1)

## Codons:
codons(y1)
which(width(codons(y1)) != 3)
codons(y1)[20:28]

## ---------------------------------------------------------------------
## 3. AN ADVANCED EXAMPLE
## ---------------------------------------------------------------------
## Translation on the '-' strand:
dna3 <- DNAStringSet(c("ATC", "GCTG", "CGACT"))
translate(reverseComplement(dna3))

## Translate sequences on both '+' and '-' strand across all 
## possible reading frames (i.e., codon position 1, 2 or 3):
## First create a DNAStringSet of '+' and '-' strand sequences, 
## removing the nucleotides prior to the reading frame start position.
dna3_subseqs <- lapply(1:3, function(pos) 
    subseq(c(dna3, reverseComplement(dna3)), start=pos))
## Translation of 'dna3_subseqs' produces a list of length 3, each with
## 6 elements (3 '+' strand results followed by 3 '-' strand results).
lapply(dna3_subseqs, translate)

## Note that translate() throws a warning when the length of the sequence
## is not divisible by 3. To avoid this warning wrap the function in 
## suppressWarnings().

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.