Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/AlignTranslation.R
Performs alignment of a set of DNA or RNA sequences by aligning their corresponding amino acid sequences.
1 2 3 4 5 6 7 | AlignTranslation(myXStringSet,
sense = "+",
direction = "5' to 3'",
readingFrame = NA,
type = class(myXStringSet),
geneticCode = GENETIC_CODE,
...)
|
myXStringSet |
A |
sense |
Single character specifying sense of the input sequences, either the positive ( |
direction |
Direction of the input sequences, either |
readingFrame |
Numeric vector giving a single reading frame for all of the sequences, or an individual reading frame for each sequence in |
type |
Character string indicating the type of output desired. This should be (an abbreviation of) one of |
geneticCode |
Either a character vector giving the genetic code in the same format as |
... |
Further arguments to be passed directly to |
Alignment of proteins is often more accurate than alignment of their coding nucleic acid sequences. This function aligns the input nucleic acid sequences via aligning their translated amino acid sequences. First, the input sequences are translated according to the specified sense
, direction
, and readingFrame
. The resulting amino acid sequences are aligned using AlignSeqs
, and this alignment is (conceptually) reverse translated into the original sequence type, sense
, and direction
. Not only is alignment of protein sequences generally more accurate, but aligning translations will ensure that the reading frame is maintained in the nucleotide sequences.
If the readingFrame
is NA
(the default) then an attempt is made to guess the reading frame of each sequence based on the number of stop codons in the translated amino acids. For each sequence, the first reading frame will be chosen (either 1
, 2
, or 3
) without stop codons, except in the final position. If the number of stop codons is inconclusive for a sequence then the reading frame will default to 1
. The entire length of each sequence is translated in spite of any stop codons identified. Note that this method is only constructive in circumstances where there is a substantially long coding sequence with at most a single stop codon expected in the final position, and therefore it is preferable to specify the reading frame of each sequence if it is known.
An XStringSet
of the class specified by type
, or a list of two components (nucleotides and amino acids) if type
is "both"
. Note that incomplete starting and ending codons will be translated into the mask character ("+") if the result includes an AAStringSet
.
Erik Wright eswright@pitt.edu
Wright, E. S. (2015). DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment. BMC Bioinformatics, 16, 322. http://doi.org/10.1186/s12859-015-0749-z
AlignDB
, AlignProfiles
, AlignSeqs
, AlignSynteny
, CorrectFrameshifts
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # first three sequences translate to MFITP*
# and the last sequence translates as MF-TP*
rna <- RNAStringSet(c("AUGUUCAUCACCCCCUAA", "AUGUUCAUAACUCCUUGA",
"AUGUUCAUUACACCGUAG", "AUGUUUACCCCAUAA"))
RNA <- AlignSeqs(rna, verbose=FALSE)
RNA
RNA <- AlignTranslation(rna, verbose=FALSE)
RNA
AA <- AlignTranslation(rna, type="AAStringSet", verbose=FALSE)
AA
BOTH <- AlignTranslation(rna, type="both", verbose=FALSE)
BOTH
# example of aligning many protein coding sequences:
fas <- system.file("extdata", "50S_ribosomal_protein_L2.fas", package="DECIPHER")
dna <- readDNAStringSet(fas)
DNA <- AlignTranslation(dna) # align the translation then reverse translate
DNA
# using a mixture of standard and non-standard genetic codes
gC1 <- getGeneticCode(id_or_name2="1", full.search=FALSE, as.data.frame=FALSE)
# Mollicutes use an alternative genetic code
gC2 <- getGeneticCode(id_or_name2="4", full.search=FALSE, as.data.frame=FALSE)
w <- grep("Mycoplasma|Ureaplasma", names(dna))
gC <- vector("list", length(dna))
gC[-w] <- list(gC1)
gC[w] <- list(gC2)
AA <- AlignTranslation(dna, geneticCode=gC, type="AAStringSet")
BrowseSeqs(AA)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.