Description Usage Arguments Details Value Author(s) References See Also Examples
Performs profile-to-profile alignment of multiple unaligned sequences following a guide tree.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | AlignSeqs(myXStringSet,
guideTree = NULL,
iterations = 2,
refinements = 1,
gapOpening = c(-18, -16),
gapExtension = c(-2, -1),
useStructures = TRUE,
structures = NULL,
FUN = AdjustAlignment,
levels = c(0.9, 0.7, 0.7, 0.4, 10, 5, 5, 2),
alphabet = AA_REDUCED[[1]],
processors = 1,
verbose = TRUE,
...)
|
myXStringSet |
An |
guideTree |
Either |
iterations |
Number of iteration steps to perform. During each iteration step the guide tree is regenerated based on the alignment and the sequences are realigned. |
refinements |
Number of refinement steps to perform. During each refinement step groups of sequences are realigned to rest of the sequences, and the best of these two alignments (before and after realignment) is kept. |
gapOpening |
Single numeric giving the cost for opening a gap in the alignment, or two numbers giving the minimum and maximum costs. In the latter case the cost will be varied depending upon whether the groups of sequences being aligned are nearly identical or maximally distant. |
gapExtension |
Single numeric giving the cost for extending an open gap in the alignment, or two numbers giving the minimum and maximum costs. In the latter case the cost will be varied depending upon whether the groups of sequences being aligned are nearly identical or maximally distant. |
useStructures |
Logical indicating whether to use secondary structure predictions during alignment. If |
structures |
Either a list of secondary structure probabilities matching the |
FUN |
A function to be applied after each profile-to-profile alignment. (See details section below.) |
levels |
Numeric with eight elements specifying the levels at which to trigger events. (See details section below.) |
alphabet |
Character vector of amino acid groupings used to reduce the 20 standard amino acids into smaller groups. Alphabet reduction helps to find more distant homologies between sequences. A non-reduced amino acid alphabet can be used by setting |
processors |
The number of processors to use, or |
verbose |
Logical indicating whether to display progress. |
... |
Further arguments to be passed directly to |
The profile-to-profile method aligns a sequence set by merging profiles along a guide tree until all the input sequences are aligned. This process has three main steps: (1) If guideTree=NULL
, an initial single-linkage guide tree is constructed based on a distance matrix of shared k-mers. Alternatively, a dendrogram
can be provided as the initial guideTree
. (2) If iterations
is greater than zero, then a UPGMA guide tree is built based on the initial alignment and the sequences are re-aligned along this tree. This process repeated iterations
times or until convergence. (3) If refinements
is greater than zero, then subsets of the alignment are re-aligned to the remainder of the alignment. This process generates two alignments, the best of which is chosen based on its sum-of-pairs score. This refinement process is repeated refinements
times, or until convergence.
The purpose of levels
is to speed-up the alignment process by not running time consuming processes when they are unlikely to change the outcome. The first four levels
control when refinements
occur and the function FUN
is run on the alignment. The default levels
specify that these events should happen when above 0.9 (AA; levels[1]
) or 0.7 (DNA/RNA; levels[3]
) average dissimilarity on the initial tree, when above 0.7 (AA; levels[2]
) or 0.4 (DNA/RNA; levels[4]
) average dissimilarity on the iterative tree(s), and after every tenth improvement made during refinement. The sixth element of levels (levels[6]
) prevents FUN
from being applied at any point to less than 5 sequences.
The FUN
function is always applied just before returning the alignment so long as there are at least levels[6]
sequences. The default FUN
is AdjustAlignment
, but FUN
can be any function that takes in an XStringSet
as its first argument, as well as weights
, processors
, and substitutionMatrix
as optional arguments. For example, the default FUN
could be altered to not perform any changes by setting it equal to function(x, ...) return(x)
, where x
is an XStringSet
.
Secondary structures are automatically computed for amino acid and RNA sequences unless structures
are provided or useStructures
is FALSE
. The default structureMatrix
is used unless an alternative is provided. For RNA sequences, secondary structures are only computed when the total length of the initial guide tree is at least 5 (levels[7]
) or the length of subsequent trees is at least 2 (levels[8]
).
An XStringSet
of aligned sequences.
Erik Wright eswright@pitt.edu
Wright, E. S. (2015). DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment. BMC Bioinformatics, 16, 322. http://doi.org/10.1186/s12859-015-0749-z
Wright, E. S. (2020). RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency. RNA 2020, 26, 531-540.
AdjustAlignment
, AlignDB
, AlignProfiles
, AlignSynteny
, AlignTranslation
, IdClusters
, ReadDendrogram
, StaggerAlignment
1 2 3 4 5 6 7 8 | db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
dna <- SearchDB(db, remove="all")
alignedDNA <- AlignSeqs(dna)
BrowseSeqs(alignedDNA, highlight=1)
# use secondary structure with RNA sequences
alignedRNA <- AlignSeqs(RNAStringSet(dna))
BrowseSeqs(alignedRNA, highlight=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.