Description Usage Arguments Details Value Author(s) References See Also Examples
Two predefined objects (GENETIC_CODE
and RNA_GENETIC_CODE
)
that represent The Standard Genetic Code.
Other genetic codes are stored in predefined table GENETIC_CODE_TABLE
from which they can conveniently be extracted with getGeneticCode
.
1 2 3 4 5 6 7 | ## The Standard Genetic Code:
GENETIC_CODE
RNA_GENETIC_CODE
## All the known genetic codes:
GENETIC_CODE_TABLE
getGeneticCode(id_or_name2="1", full.search=FALSE, as.data.frame=FALSE)
|
id_or_name2 |
A single string that uniquely identifies the genetic code to extract.
Should be one of the values in the |
full.search |
By default, only the |
as.data.frame |
Should the genetic code be returned as a data frame instead of a named character vector? |
Formally, a genetic code is a mapping between the 64 tri-nucleotide sequences (called codons) and amino acids.
The Standard Genetic Code (a.k.a. The Canonical Genetic Code, or simply The Genetic Code) is the particular mapping that encodes the vast majority of genes in nature.
GENETIC_CODE
and RNA_GENETIC_CODE
are predefined named
character vectors that represent this mapping.
All the known genetic codes are summarized in GENETIC_CODE_TABLE
,
which is a predefined data frame with one row per known genetic code.
Use getGeneticCode
to extract one genetic code at a time from
this object.
GENETIC_CODE
and RNA_GENETIC_CODE
are both named character
vectors of length 64 (the number of all possible tri-nucleotide sequences)
where each element is a single letter representing either an amino acid
or the stop codon "*"
(aka termination codon).
The names of the GENETIC_CODE
vector are the DNA codons i.e. the
tri-nucleotide sequences (directed 5' to 3') that are assumed to belong
to the "coding DNA strand" (aka "sense DNA strand" or "non-template DNA
strand") of the gene.
The names of the RNA_GENETIC_CODE
are the RNA codons i.e. the
tri-nucleotide sequences (directed 5' to 3') that are assumed to belong
to the mRNA of the gene.
Note that the values in the GENETIC_CODE
and RNA_GENETIC_CODE
vectors are the same, only their names are different. The names of the
latter are those of the former where all occurrences of T (thymine) have
been replaced by U (uracil).
Finally, both vectors have an alt_init_codons
attribute on them,
that lists the alternative initiation codons. Note that codons that
always translate to M
(Methionine) (e.g. ATG in GENETIC_CODE
or AUG in RNA_GENETIC_CODE
) are omitted from the
alt_init_codons
attribute.
GENETIC_CODE_TABLE
is a data frame that contains all the known
genetic codes listed at ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt.
The data frame has one row per known genetic code and the 5 following
columns:
name
: The long and very descriptive name of the genetic code.
name2
: The short name of the genetic code (not all genetic
codes have one).
id
: The id of the genetic code.
AAs
: A 64-character string representing the genetic code
itself in a compact form (i.e. one letter per codon, the codons
are assumed to be ordered like in GENETIC_CODE
).
Starts
: A 64-character string indicating the Initiation
Codons.
By default (i.e. when as.data.frame
is set to FALSE),
getGeneticCode
returns a named character vector of length 64
similar to GENETIC_CODE
i.e. it contains 1-letter strings from
the Amino Acid alphabet (see ?AA_ALPHABET
) and its names
are identical to names(GENETIC_CODE)
. In addition it has an attribute
on it, the alt_init_codons
attribute, that lists the alternative
initiation codons. Note that codons that always translate to M
(Methionine) (e.g. ATG) are omitted from the alt_init_codons
attribute.
When as.data.frame
is set to TRUE, getGeneticCode
returns a
data frame with 64 rows (one per codon), rownames (3-letter strings
representing the codons), and the 2 following columns:
AA
: A 1-letter string from the Amino Acid alphabet (see
?AA_ALPHABET
) representing the amino acid mapped to
the codon ("*"
is used to mark the stop codon).
Start
: A 1-letter string indicating an alternative mapping
for the codon i.e. what amino acid the codon is mapped to when it's
the first tranlated codon.
The rownames of the data frame are identical to names(GENETIC_CODE)
.
H. Pag<c3><a8>s
All the known genetic codes are described here:
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
The "official names" of the various codes ("Standard", "SGC0", "Vertebrate Mitochondrial", "SGC1", etc..) and their ids (1, 2, etc...) were taken from the print-form ASN.1 version of the above document (version 4.0 at the time of this writing):
ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt
AA_ALPHABET
and AMINO_ACID_CODE
.
The translate
and trinucleotideFrequency
functions.
DNAString, RNAString, and AAString objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | ## ---------------------------------------------------------------------
## THE STANDARD GENETIC CODE
## ---------------------------------------------------------------------
GENETIC_CODE
## Codon ATG is *always* translated to M (Methionine)
GENETIC_CODE[["ATG"]]
## Codons TTG and CTG are "normally" translated to L except when they are
## the first translated codon (a.k.a. start codon or initiation codon),
## in which case they are translated to M:
attr(GENETIC_CODE, "alt_init_codons")
GENETIC_CODE[["TTG"]]
GENETIC_CODE[["CTG"]]
sort(table(GENETIC_CODE)) # the same amino acid can be encoded by 1
# to 6 different codons
RNA_GENETIC_CODE
all(GENETIC_CODE == RNA_GENETIC_CODE) # TRUE
## ---------------------------------------------------------------------
## ALL THE KNOWN GENETIC CODES
## ---------------------------------------------------------------------
GENETIC_CODE_TABLE[1:3 , ]
getGeneticCode("SGC0") # The Standard Genetic Code, again
stopifnot(identical(getGeneticCode("SGC0"), GENETIC_CODE))
getGeneticCode("SGC1") # Vertebrate Mitochondrial
getGeneticCode("ascidian", full.search=TRUE) # Ascidian Mitochondrial
## ---------------------------------------------------------------------
## EXAMINE THE DIFFERENCES BETWEEN THE STANDARD CODE AND A NON-STANDARD
## ONE
## ---------------------------------------------------------------------
idx <- which(GENETIC_CODE != getGeneticCode("SGC1"))
rbind(Standard=GENETIC_CODE[idx], SGC1=getGeneticCode("SGC1")[idx])
|
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colMeans, colSums, colnames, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
setdiff, sort, table, tapply, union, unique, unsplit, which,
which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
strsplit
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L"
CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T"
AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
"N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGT GGC GGA GGG
"G" "G" "G" "G"
attr(,"alt_init_codons")
[1] "TTG" "CTG"
[1] "M"
[1] "TTG" "CTG"
[1] "L"
[1] "L"
GENETIC_CODE
M W C D E F H K N Q Y * I A G P T V L R S
1 1 2 2 2 2 2 2 2 2 2 3 3 4 4 4 4 4 6 6 6
UUU UUC UUA UUG UCU UCC UCA UCG UAU UAC UAA UAG UGU UGC UGA UGG CUU CUC CUA CUG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L"
CCU CCC CCA CCG CAU CAC CAA CAG CGU CGC CGA CGG AUU AUC AUA AUG ACU ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T"
AAU AAC AAA AAG AGU AGC AGA AGG GUU GUC GUA GUG GCU GCC GCA GCG GAU GAC GAA GAG
"N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGU GGC GGA GGG
"G" "G" "G" "G"
attr(,"alt_init_codons")
[1] "UUG" "CUG"
[1] TRUE
name name2 id
1 Standard SGC0 1
2 Vertebrate Mitochondrial SGC1 2
3 Yeast Mitochondrial SGC2 3
AAs
1 FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
2 FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
3 FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts
1 ---M---------------M---------------M----------------------------
2 --------------------------------MMMM---------------M------------
3 ----------------------------------MM----------------------------
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L"
CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T"
AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
"N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGT GGC GGA GGG
"G" "G" "G" "G"
attr(,"alt_init_codons")
[1] "TTG" "CTG"
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "W" "W" "L" "L" "L" "L"
CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "M" "M" "T" "T" "T" "T"
AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
"N" "N" "K" "K" "S" "S" "*" "*" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGT GGC GGA GGG
"G" "G" "G" "G"
attr(,"alt_init_codons")
[1] "ATT" "ATC" "GTG"
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "W" "W" "L" "L" "L" "L"
CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "M" "M" "T" "T" "T" "T"
AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
"N" "N" "K" "K" "S" "S" "G" "G" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGT GGC GGA GGG
"G" "G" "G" "G"
attr(,"alt_init_codons")
[1] "TTG" "GTG"
TGA ATA AGA AGG
Standard "*" "I" "R" "R"
SGC1 "W" "M" "*" "*"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.