Biostrings: Efficient manipulation of biological strings

translate

R Documentation

Translating DNA/RNA sequences

Description

Functions for translating DNA or RNA sequences into amino acid sequences.

Usage

## Translating DNA/RNA:
translate(x, genetic.code=GENETIC_CODE, no.init.codon=FALSE,
             if.fuzzy.codon="error")

## Extracting codons without translating them:
codons(x)

Arguments

`x`	A DNAStringSet, RNAStringSet, DNAString, RNAString, MaskedDNAString or MaskedRNAString object for `translate`. A DNAString, RNAString, MaskedDNAString or MaskedRNAString object for `codons`.
`genetic.code`	The genetic code to use for the translation of codons into Amino Acid letters. It must be represented as a named character vector of length 64 similar to predefined constant `GENETIC_CODE`. More precisely: it must contain 1-letter strings in the Amino Acid alphabet; its names must be identical to `names(GENETIC_CODE)`; it must have an `alt_init_codons` attribute on it, that lists the alternative initiation codons. The default value for `genetic.code` is `GENETIC_CODE`, which represents The Standard Genetic Code. See `?AA_ALPHABET` for the Amino Acid alphabet, and `?GENETIC_CODE` for The Standard Genetic Code and its known variants.
`no.init.codon`	By default, `translate()` assumes that the first codon in a DNA or RNA sequence is the initiation codon. This means that the `alt_init_codons` attribute on the supplied `genetic.code` will be used to translate the alternative initiation codons. This can be changed by setting `no.init.codon` to TRUE, in which case the `alt_init_codons` attribute will be ignored.
`if.fuzzy.codon`	How fuzzy codons (i.e codon with IUPAC ambiguities) should be handled. Accepted values are: `"error"`: An error will be raised on the first occurence of a fuzzy codon. This is the default. `"solve"`: Fuzzy codons that can be translated non ambiguously to an amino acid or to * (stop codon) will be translated. Ambiguous fuzzy codons will be translated to X. `"error.if.X"`: Fuzzy codons that can be translated non ambiguously to an amino acid or to * (stop codon) will be translated. An error will be raised on the first occurence of an ambiguous fuzzy codon. `"X"`: All fuzzy codons (ambiguous and non-ambiguous) will be translated to X. Alternatively `if.fuzzy.codon` can be specified as a character vector of length 2 for more fine-grained control. The 1st string and 2nd strings specify how to handle non-ambiguous and ambiguous fuzzy codons, respectively. The accepted values for the 1st string are: `"error"`: Any occurence of a non-ambiguous fuzzy codon will cause an error. `"solve"`: Non-ambiguous fuzzy codons will be translated to an amino acid or to *. `"X"`: Non-ambiguous fuzzy codons will be translated to X. The accepted values for the 2nd string are: `"error"`: Any occurence of an ambiguous fuzzy codon will cause an error. `"X"`: Ambiguous fuzzy codons will be translated to X. All the 6 possible combinations of 1st and 2nd strings are supported. Note that `if.fuzzy.codon=c("error", "error")` is equivalent to `if.fuzzy.codon="error"`, `if.fuzzy.codon=c("solve", "X")` is equivalent to `if.fuzzy.codon="solve"`, `if.fuzzy.codon=c("solve", "error")` is equivalent to `if.fuzzy.codon="error.if.X"`, and `if.fuzzy.codon=c("X", "X")` is equivalent to `if.fuzzy.codon="X"`.

Details

translate reproduces the biological process of RNA translation that occurs in the cell. The input of the function can be either RNA or coding DNA. By default The Standard Genetic Code (see ?GENETIC_CODE) is used to translate codons into amino acids but the user can supply a different genetic code via the genetic.code argument.

codons is a utility for extracting the codons involved in this translation without translating them.

Value

For translate: An AAString object when x is a DNAString, RNAString, MaskedDNAString, or MaskedRNAString object. An AAStringSet object parallel to x (i.e. with 1 amino acid sequence per DNA or RNA sequence in x) when x is a DNAStringSet or RNAStringSet object. If x has names on it, they're propagated to the returned object.

For codons: An XStringViews object with 1 view per codon. When x is a MaskedDNAString or MaskedRNAString object, its masked parts are interpreted as introns and filled with the + letter in the returned object. Therefore codons that span across masked regions are represented by views that have a width > 3 and contain the + letter. Note that each view is guaranteed to contain exactly 3 base letters.

Examples

## ---------------------------------------------------------------------
## 1. BASIC EXAMPLES
## ---------------------------------------------------------------------

dna1 <- DNAString("TTGATATGGCCCTTATAA")
translate(dna1)
## TTG is an alternative initiation codon in the Standard Genetic Code:
translate(dna1, no.init.codon=TRUE)

SGC1 <- getGeneticCode("SGC1")  # Vertebrate Mitochondrial code
translate(dna1, genetic.code=SGC1)
## TTG is NOT an alternative initiation codon in the Vertebrate
## Mitochondrial code:
translate(dna1, genetic.code=SGC1, no.init.codon=TRUE)

## All 6 codons except 4th (CCC) are fuzzy:
dna2 <- DNAString("HTGATHTGRCCCYTRTRA")

## Not run: 
  translate(dna2)  # error because of fuzzy codons

## End(Not run)

## Translate all fuzzy codons to X:
translate(dna2, if.fuzzy.codon="X")

## Or solve the non-ambiguous ones (3rd codon is ambiguous so cannot be
## solved):
translate(dna2, if.fuzzy.codon="solve")

## Fuzzy codons that are non-ambiguous with a given genetic code can
## become ambiguous with another genetic code, and vice versa:
translate(dna2, genetic.code=SGC1, if.fuzzy.codon="solve")

## ---------------------------------------------------------------------
## 2. TRANSLATING AN OPEN READING FRAME
## ---------------------------------------------------------------------

file <- system.file("extdata", "someORF.fa", package="Biostrings")
x <- readDNAStringSet(file)
x

## The first and last 1000 nucleotides are not part of the ORFs:
x <- DNAStringSet(x, start=1001, end=-1001)

## Before calling translate() on an ORF, we need to mask the introns
## if any. We can get this information fron the SGD database
## (http://www.yeastgenome.org/).
## According to SGD, the 1st ORF (YAL001C) has an intron at 71..160
## (see http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YAL001C)
y1 <- x[[1]]
mask1 <- Mask(length(y1), start=71, end=160)
masks(y1) <- mask1
y1
translate(y1)

## Codons:
codons(y1)
which(width(codons(y1)) != 3)
codons(y1)[20:28]

## ---------------------------------------------------------------------
## 3. AN ADVANCED EXAMPLE
## ---------------------------------------------------------------------

## Translation on the '-' strand:
dna3 <- DNAStringSet(c("ATC", "GCTG", "CGACT"))
translate(reverseComplement(dna3))

## Translate sequences on both '+' and '-' strand across all 
## possible reading frames (i.e., codon position 1, 2 or 3):
## First create a DNAStringSet of '+' and '-' strand sequences, 
## removing the nucleotides prior to the reading frame start position.
dna3_subseqs <- lapply(1:3, function(pos) 
    subseq(c(dna3, reverseComplement(dna3)), start=pos))
## Translation of 'dna3_subseqs' produces a list of length 3, each with
## 6 elements (3 '+' strand results followed by 3 '-' strand results).
lapply(dna3_subseqs, translate)

## Note that translate() throws a warning when the length of the sequence
## is not divisible by 3. To avoid this warning wrap the function in 
## suppressWarnings().

Bioconductor/Biostrings documentation built on June 10, 2025, 1:14 p.m.